"When we observe the complex world around us, it is inevitable that we leave a good part of the possibilities unseen," said Mesrob Ohannessian, a final year Ph.D. student in LIDS. He introduces his research with a basic example: birdwatchers counting birds in a park. "How would they estimate their chances of seeing a new species of bird?" he asked, adding, "We, humans, intuitively know that the possibility is out there, but it is not immediately clear how to quantify it."
Over the past five years, Mesrob has developed his research at LIDS around such questions of quantifiably predicting events that happen so rarely they're nearly impossible to foresee. His own encounter with LIDS was arguably such an event. He first learned about the lab as a matter of proximity. He recalls taking a class in stochastic processes, detection and estimation – a course that continues to guide how he thinks about statistics and probability. "When the lecturer learned of my fascination about the topic, he told me he worked in LIDS, and recommended that I attend the lab seminars. I did so eagerly."
At the time, Mesrob had graduated from the American University of Beirut and was a master's student at MIT, designing and developing educational software that could be used to teach electromagnetism to freshmen. He was working in building 9, just a few short steps from building 35, the home of LIDS at the time. "So it was especially convenient to get to those seminars, and to also interact with people in the lab." Over the course of these visits, he was introduced to LIDS professors Sanjoy Mitter and Munther Dahleh, from whom he gained a deeper understanding of the lab's research. Mesrob liked what he learned, and formally joined the lab in 2006, under their supervision.
He soon became interested in rare events, especially the question of when one can actually say something meaningful about them. The birdwatcher example may seem esoteric, but it has some very real consequences. Indeed, determining the probability of rare events has a storied mathematical background. With his coworker Jack Good, mathematical great Alan Turing developed an algorithm for this unseen species problem during World War II. This algorithm, which uses knowledge of almost-rare events to help predict truly rare ones, helped crack German Enigma ciphers. On a different front, the North Sea flood of 1953 spurred a number of intense government initiatives in Europe. "They wanted to understand another kind of rare event: when natural phenomena, such as wind speeds or wave heights, exceed historical highs," Mesrob said.
His own motivation, however, was of a different sort: to help machines understand natural language. "When computers perform automatic speech recognition, it is crucial for them to estimate the number of words, or succession of words, that they have not seen or have seen very rarely during their training. If they do not take such ignorance into account, they will be overconfident and misinterpret," he said. So he started his research by examining whether the famous Good-Turing estimator was any better than simply answering "never" to the question of whether one would see a new outcome, that is a new bird species in the park or a new word in the text.
Mesrob says he soon made a critical observation. The Good-Turing estimator's advantage depends on how outcomes look when they are arranged from most frequent to least frequent. If outcomes become less frequent slowly, then the Good-Turing estimator is preferable. But if they become less frequent quickly, then it is about as good to guess that an unobserved outcome will never occur. This property of "becoming less frequent slowly" is often called a "heavy tail," because of the appearance of the distribution graph.
By making this distinction, Mesrob believes he has isolated a common principle underlying many other rare event problems. In fact, in the problem of exceeding historical highs, researchers had also identified heavy tails as a critical assumption to make inference possible. But no one had highlighted their importance in the problem of estimating the probability of new outcomes.
Therefore, Mesrob has been working on problems of the special mathematical relationships of power laws, which formalize the notion of heavy tails. Making such structural assumptions are necessary to infer things about the world, he explains. "This manifests itself in our everyday life, just as much as it does in almost every field of science and engineering where measurements are made and estimates are needed." With this new perspective, Mesrob is assembling a framework where we can not only analyze algorithms such as the Good-Turing estimator, but also go beyond them, by giving stronger and more accurate predictions.
Mesrob believes his work has many potential applications, like evaluating faults in power grids, estimating the probability that a physical system evolves beyond its desirable range of operation, or modeling rare changes in financial markets. However, one of the first places where he wants to test the usefulness of this new framework is in natural language modeling, which gave him his original impetus, and where such techniques are already heavily used. Researchers often use algorithms to patch over the gaps in rare data – called smoothing – in a relatively ad hoc way. "In the framework that I am developing, there is potential to perform such smoothing implicitly, in a principled and predictable way, and by doing so to boost speech recognition and machine translation performance," he said. "Perhaps I am helping our artificial brethren get better at gauging risks and recognizing novelty, tasks that we do relatively effortlessly as humans."
Outside of his research, Mesrob enjoys spending time in the outdoors – though bird watching is not one of his hobbies, he camps and hikes. He and his girlfriend have taken up a vegetable garden, too, which can be another problem to solve. "Growing from seed to healthy plants is challenging, rewarding, and fun -- also fickle and frustrating sometimes," he said, "but nothing beats home grown tomatoes!"
Mesrob has also taken advantage of the open environment LIDS fosters. This has been an important way the lab helped promote and complement diverse parts of his education. He gives the example of a student-run seminar course in his area of interest, which he co-organized in Fall 2007."We used this opportunity to develop mini-curricula that helped get us up to speed with recent research topics that were not yet organized into traditional courses," said Mesrob. He also talks about being part of a summer study group with other LIDS students, where they taught each other topology. "These and other experiences, be they organized or not, are not unique to me, and LIDS accelerates its students' education very effectively in this way."
The lab has offered Mesrob support and inspiration, as well. "LIDS provides an environment where engineering and mathematics meet vigorously. The foundations of the lab, the questions the people ask, remain very deeply entrenched in engineering and practical concerns," he said. He describes LIDS as a place where the culture insists that these questions be asked, and solved, with sound mathematics. "This is a skill that I think LIDS helps hone in everyone who passes through." It isn't difficult to predict, unlike the events he researches, that Mesrob will use this skill with great frequency, as he journeys onward to his future career.