When Guy Bresler, Associate Professor of Electrical Engineering and Computer Science at MIT and a member of LIDS, travels to attend conferences, he often spends extra time in the area checking out the local rock climbing. He likes to pick routes that are just on the edge of possible for his strength and abilities, where there's uncertainty as to whether he can succeed.
In his first few attempts of a hard climb, the individual moves can feel impossible, and linking them together less probable still. Nevertheless, he keeps at it until he can (with luck) string together a sequence of moves that brings him to the top. One tough climb took him twenty tries, over ten separate occasions.
Guy’s attitude towards research is similar. He likes to tackle big questions for which the path to a solution is entirely unclear, with the only way forward being open-minded exploration.
"I like working on problems where the current situation is that we’re totally in the dark, it’s not even clear how to think about these questions, what the right tools are, or what the right way to start is," Guy says. “The risk is high, but personally, I find this style of research the most rewarding.”
One of the big questions that Guy wants to tackle is that of statistical-computational tradeoffs in statistical inference problems: what causes them to occur and how can we find the tipping points or phase transitions?
To understand his research, it helps to start with the basic question in mathematical statistics — how much data is needed to solve a problem? The minimum amount of data required to solve a problem is often called its statistical or information theoretic limit. Many problems for which this minimum amount of data has been determined are not actually practically solvable with this amount of data, because the computational cost is too high. There is no computationally efficient algorithm that will work on them, so it might take a computer years to find the solution.
However, for many of these problems, if one gathered some further amount data, the very nature of the problem would shift and the computational cost to solve it would go down. This higher data threshold, sometimes referred to as the computational limit, is in laymen’s terms the amount of data needed to make the problem solvable on your laptop. Types of problems that have a mismatch between their statistically and computationally optimal data requirements are referred to as having a statistical-computational gap, and they include some of the most basic and universal statistical procedures, such as low rank matrix estimation, sparse phase retrieval, and robust sparse linear regression.
Finding such problems’ computational limits would provide valuable information to data scientists. If a scientist knows that collecting twice as much data will make solving a problem exponentially easier computationally, this would be crucial information that at a minimum might yield huge savings in cost and energy usage.
Guy hopes to find computational limits for as many problems as possible, but at a more basic level he wants to understand what changes about the nature of these problems as they cross the computational limit to make them solvable by efficient algorithms. The shift is not simply a matter of scale, in which having more data speeds up the runtime for finding a solution using the same approach. Instead, the structural properties of a problem changes, enabling the use of completely different approaches — often quite simple ones — to solve it.
Even more than understanding individual problems, Guy is interested in finding new connections between seemingly different ones. Are there commonalities across problems that explain the qualitative shift they undergo at the computational limit? If there are commonalities, could discovering them allow researchers to more efficiently determine the computational limits and best algorithmic approaches for new problems?
To answer these questions, Guy wants to create a comprehensive average-case complexity theory for statistical inference problems. This will categorize statistical problems based on whether they are efficiently solvable or not, similar to the classical P versus NP theory for combinatorial problems. The challenge in relating statistical problems is that their inputs are data described by probability distributions, so connecting them requires also precisely relating these distributions, a delicate task. Demonstrating equivalence between complex problems is difficult, but it has the potential for huge payoff: it will allow researchers to extend knowledge about one problem’s statistical and computational limits to others, and transfer tools and approaches from better-understood problems to less-understood ones.
Building a comprehensive theory of this scale is a huge undertaking, and as such requires collaboration. Guy’s closest collaborator on this project had been his graduate student Matthew Brennan, but Matt passed away in early 2021. Guy still grapples with his loss as a friend, rising star in the field, and partner in the work, and for a while he had to set the project aside. However, in recent months he has turned his attention to it again. In the Fall of 2021 he lead a program, “Computational Complexity of Statistical Inference,” at the Simons Institute at the University of California Berkeley. The program hosted researchers from a variety of adjacent fields to work on the topic.
“We learn the most and have the most fun when we work as a team,” Guy says, explaining one of the keys pieces of his approach: Whether in climbing or in research, when you take on a challenge that tests your limits, the best way to succeed is to tackle it together.