The northern Indian state of Himachal Pradesh (which means, roughly, “land of snow”) is known for its fascinating Himalayan beauty. Full of scenic routes, gushing rivers, delicious fruits, and crystal clear skies, it leaves an indelible imprint on everybody who’s been there.
From this high altitude setting comes Dr Suvrit Sra, a Principal Research Scientist at the Laboratory for Information and Decision Sciences. Aptly, in his work on machine learning, optimization, and statistics, he is motivated to search for connections between different issues and topics – the bird's-eye perspective, so to speak.
Recently, he applied this perspective to uncover new mathematical techniques for manifold optimization. “Sometimes, when you develop new mathematical tools, you revisit old problems to see if there's a deeper, hitherto unknown, connection that can help you solve them better,” he says.
In this case, Suvrit and his colleague Reshad Hosseini of the University of Tehran, took a new look at parameter estimation for Gaussian Mixture Models (GMMs).
In statistics, a mixture model helps to represent sub-populations or clusters amidst a larger general population. For instance, say you have a list of all housing sale transactions for the year; can you identify, by looking at the data, clusters of transactions for studio apartments sales or for sales of single-family homes?
To solve Gaussian Mixture Models, the gold standard has long been an algorithm called expectation maximization or EM. Suvrit and his colleague had been working on some techniques in non-Euclidean geometry, however, and their intuition suggested their geometric ideas could be applied to GMMs, too. Their challenge was to improve upon EM by building on a different optimization technique – Riemannian manifold optimization.
Their first attempt, however, failed spectacularly. Expectation maximization was, after all, the gold standard, Suvrit says. Back to the blackboard, Suvrit and Reshad realized a subtle differential geometric point that had previously escaped them but they thought might help them recover from the setback. And it did indeed! In fact, so much so that Suvrit says, “We're still trying to understand why it worked so well.”
In 2015, their work was accepted as part of the Advances in Neural Information Processing Systems (NIPS) conference proceedings—the largest, oldest, and most prestigious machine-learning conference in the world.
Suvrit joined LIDS quite recently in January 2015. But his interest in computer science dates back to the late 1980s when he was first introduced to computers in elementary school.
“I was probably 11 or so,” he says. “My school was just modern enough, and we were fortunate because even though they introduced computers to us, the first thing they taught us was programming, rather than how to use computer software – so I became interested in computer programming. I was very interested, as children are, in learning how to 'be a hacker', figuring out how to 'break' software, that kind of thing.”
That led him eventually to a PhD in computer science from the University of Texas at Austin, where he discovered his interest in machine learning and optimization. Subsequently, he took on a research position at the Max Planck Institute in Germany (a place that he fondly remembers for its calm, deep, research atmosphere) in the exceptionally pretty town Tübingen; later he held visiting faculty positions at the University of California, Berkeley, and Carnegie Mellon University.
Currently, his work centers on optimization for machine learning – making machine-learning models and algorithms as sleek and efficient as possible. Today, machine learning, which refers to how computers learn from data without being given a full set of explicit instructions, is a much-bandied-about buzzword thanks to its growing popularity and range of applications.
This wasn't always the case, says Suvrit, even though the machine-learning field has been growing for over three decades now. “Around 2007 I was trying to gather more support for bringing in more optimization into machine learning, but I had a tough time finding co-organizers,” he says.
Take, for example, what's called deep learning. In deep learning one goes through a large dataset bit by bit, sampling small chunks to analyze, learning a little more each time, and continually updating a model. That enables a machine to find the correct result: trained on enough images of cats, it can learn to recognize a cat in a novel image, for instance. Or, taught to recognize angry tweets, it can uncover the same mood or political sentiment from a set of Twitter or Facebook posts.
“If you go through the data in a careful manner, guided by theory, you can make more effective use of the data. That may end up cutting down the amount of time it takes to train a neural network, sometimes by hundreds of times,” Suvrit says. “Optimization is what puts life into the system.”
To attract and encourage other researchers, he began organizing a workshop, specifically on optimization for machine learning, at the annual NIPS conference. “Today, after almost nine years, a large number of the cutting-edge results in optimization are being generated by people in machine learning,” he says.
Now he is working on a number of applications in collaboration with other researchers and organizations.
For example, intensive care doctors have many patients; all their cases are, by definition, serious. Yet some of these cases are more complex to treat than others. By comparing a patient's data with a range of other cases with known outcomes, can machine learning help tell a relatively simple case apart from a more complex one? “Of course, doctors use their prior knowledge and experience to make decisions,” Suvrit says. “Can we help these expert doctors arrive at better decisions faster, to handle patients accurately and precisely?”
Another application is in smart, web-connected devices – the so-called Internet of Things. In a collaboration with local firm Analog Devices International, which builds sensors, processors, and other hardware, Suvrit aims to design algorithms to run well on tiny devices using very low battery power. Putting machine-learning capability on such minuscule hardware calls for different methods and techniques, he explains.
LIDS, and MIT in general, are dynamic places that encourage such diverse activity, he says. “One of the things I like about MIT is that it's a very high-energy place, and that suits my temperament. I don’t have to be embarrassed to be a math geek here!” Students help develop his ideas; projects with collaborators can sometimes arise spontaneously. “One of the hardest parts is picking which questions are worth answering.”
Beyond work, Suvrit enjoys hiking, dabbling in foreign languages (he taught himself German before he went to Germany), and enjoys Urdu poetry. He also tinkers with pure-mathematics problems for fun, often answering questions on mathematics website MathOverflow.
“It's the only social media I care about,” he says. “While answering other people's questions on the website, I end up discovering answers to my own questions.” In mathematics and computer science, pure and applied, it seems, it helps to have that high-level view.