LIDS/ALL 2013

From "Likes" on Facebook to five-star movie ratings on Netflix and uploaded images in Google Photos, our world is virtually leaking raw data. Increasingly sophisticated technologies are exponentially increasing the amounts of it we can collect. This data deluge poses certain problems for science and engineering, namely that we are producing more data than we can store and analyze. For LIDS alum Martin Wainwright, the real challenge is making all that data useful. "Data, on its own, is not really interesting," he says. "Data contains information. Information is interesting." It's not the details on Facebook that advertisers need, but large patterns in user behavior. It's not the individual cases of HIV, but the vectors of transmission that hold value for epidemiologists.

Martin spends his time thinking about how to convert raw data to meaningful information. He designs algorithms to extract structure from massive strings of zeros and ones. The needs for this kind of work are many. For example, image processing requires ways to compress pixels; Netflix uses algorithms that can quickly assess people's eclectic movie choices and come up with tailored recommendations. But Martin works on the fundamentals of culling information from data. While these potential applications of his work drive the research, they don't determine the conceptual questions he asks.

Martin works as a professor with a joint appointment between the Department of Electrical Engineering and Computer Science, and Department of Statistics at UC Berkeley. These days, he's spending his sabbatical here at LIDS. It's not far from home, actually, as Martin earned his PhD here in 2002. The institution has left an indelible mark on his career, and naturally, it's pulled him back. "Once you've been here, it's always in your blood," he says. During this extended visit, he is working on a book about high dimensional data, teaching a special graduate topics class, and continuing his research with a group of his Berkeley students who followed him to MIT.

One of Martin's favorite algorithms to emerge from his group is one that extracts all the voting records of US senators from a large database publically available at Senate.gov. When Martin demonstrates it, the complexity of the math seems to disappear. At a click of a button on his laptop, the installed algorithm spits out the image of a big circle. Tiny red and blue lines span the circle's diameter, like strings stretching across a dream catcher. Martin explains that the lines, red for Republican, Blue for Democrat, and yellow for Independent, show actual relationships between the senators.

This algorithm, without knowing the political affiliations of the senators, has "learned" about them. It turns out that there are many more connections between Democrats than Republicans. There is one highly connected node, reflecting many shared votes—it's Lieberman in 2006, right before he switched from Democrat to Independent. Martin says that this kind of algorithm reveals changes in data over time, which would be useful in extracting other interesting information from any kind of social network.

Martin's research could also apply to something called error control coding. When a satellite sends transmissions back to Earth, the data travels in a string of zeros and ones. During their journey, some of the data points will inevitably become corrupted; some zeros will become ones, for instance, and some digits will be erased. Engineers need to combat these errors of transmission by introducing redundancy, basically extra code. Back on Earth, the coding measure can help the corrupted code self-correct. This idea doesn't just apply to sending information to and from the far-flung reaches of space, though. Supermarket barcodes also contain an error control code. If a package is scanned off-center and records a number wrong, the extra checks figure out the mistake.

Martin has a multidisciplinary background, and jokes about being a dilettante. After receiving his undergraduate degree in mathematics, he earned a Masters in neuroscience from Harvard. The human brain may at first seem a big jump from theoretical algorithms, but Martin explains that the statistical analysis inherent in neuroscience modeling drew him into Information and Decision Systems. He is still fascinated by the brain, "the most sophisticated object on Earth," he calls it. "Computer vision systems come no where near what a simple visual system can do in our brain…It's interesting from an engineering standpoint because here is a system and we are not capable of engineering anything remotely close to it. At least not yet." In the future, it's a challenge he wants to tackle.

Martin says it might be possible to apply his current methods of high dimensional data analysis to neuroscience. As it becomes possible to record the action of different neurons simultaneously, something like the voting records algorithm could detect interesting patterns in the ways cells electrically interact. Similar to amassing senate votes from consecutive years, Martin explains, recordings from neurons would provide spikes of data over time. Theoretically, the right algorithm could infer neural connections and patterns, the kind of information that would revolutionize our understanding of the brain.

If you sit right outside Martin's office, located centrally in the midst of other offices and cubicals in LIDS, you will hear streams of voices and the sounds of collaboration emerge from all directions. Martin likes this unique feature of LIDS. "People [here] like to study fundamental principles, but the applications they work on are really diverse—biological, social networks, aircraft control systems" he says. So, there's common ground, and everyone has a shared way of thinking about things, but there's enough diversity as well that it makes for interesting interaction."

Being able to just walk around the lab and have productive conversations with other researchers is a huge benefit, as Martin's work is not always fun. Detecting order in the chaos of a data deluge is a "non-linear" experience. "You are struggling for a long time to formulate the right model, or to understand what key ingredients in problems are," Martin says. "But what keeps you addicted to it is that moment when you are working very hard, feeling like you are on treadmill making no progress, but then suddenly there is a flash of insight. That's what keeps me coming back."