LIDS/ALL 2021

When Mengdi Wang SM ’10 PhD ’13 was a graduate student at LIDS she took a course on poker. Mengdi played often at MIT, honing her strategy and soon she was winning cash prizes in large poker tournaments. Now an associate professor of electrical and computer engineering at Princeton University, Mengdi does not play much poker these days. But when she recalls her training at LIDS, those poker games remain a favorite memory.

Reinforcement learning, the field of machine learning in which Mengdi works, also spent much of its early years focused on how to win games. Now, researchers including Mengdi are working to improve the methodology behind reinforcement learning in order to make it scalable, efficient, and generalizable, so that its use can be expanded in fields from medicine to finance to robotics.

In reinforcement learning, the AI agent is presented with an environment in a certain state — say, a game of poker where it holds a pair of aces — and must choose what action to take — in this case check, call, raise, or fold. The AI receives a positive or negative reward for its actions, and its goal is to maximize its cumulative reward. In this way, the AI learns through trial and error, not unlike a human child. The AI develops its own rules for how to maximize rewards. Once trained, algorithms are capable of stringing together an optimal sequence of decisions in a very complex environment, developing strategies that surpass those of humans. AIs trained with reinforcement learning have bested human champions in games from chess to Go to Starcraft. A single algorithm, DeepMind’s MuZero, used reinforcement learning to master sixty different games without being given any of the rules at the start.

The strengths that enable these algorithms to best human players also make them alluring potential tools in other fields. “That gives researchers the hope that computers could surpass humans in many other domains, and help us make better decisions,” Mengdi says.

The potential uses for reinforcement learning are manifold. It has been used to varying success in smart device networks, stock trading algorithms, natural language processing, efficient energy systems, self-driving cars, robotics, and more.

However, there are challenges to expanding the use of reinforcement learning. When Mengdi began exploring this area of interest, she found that there were many open theoretical questions in the field. She describes her work as establishing the foundational methodology for reinforcement learning algorithms, in order to help unlock the full potential of these algorithms. Although early in her career, Mengdi has already made significant contributions to reinforcement learning methodology and theory, as well as developed algorithms to solve a variety of practical problems.

One of Mengdi’s key interests is efficiency: are algorithms making the best use of data and experiments? Training tends to require an algorithm to search large and complex problem spaces, which consumes a lot of energy and runtime. Mengdi’s group has developed methods that provably enable algorithms to efficiently determine the optimal strategy, or policy, to follow.

Another of Mengdi’s interests is moving beyond the use of additive rewards, which have been the cornerstone of reinforcement learning, but have limitations. Game strategies are often easy to deconstruct into discrete moves that can be assigned values. However, some complex problems cannot be so easily deconstructed. Furthermore, researchers may want the AI to incorporate other factors, such as exploration, risk, and prior human knowledge, into its optimal policy. Mengdi has made progress in this area as well: “We show that with not too complex modifications to existing algorithms, one can actually extend the entire problem domain to solve problems with complex utilities that are not additive,” Mengdi says.

Mengdi has also tackled how to train an AI with small or sparse datasets. Gaming algorithms can be trained in a simulator that allows them to run through many, many iterations of the game and so to encounter most of the possible states they will ultimately be faced with as they develop their optimal strategy. In fields such as medicine, the problem space tends not to be so well mapped out, and so the algorithm must develop a strategy based on limited information. Mengdi has experienced this issue first hand. When she worked with medical insurance companies to optimize the treatment plan for people receiving knee replacement surgery, the only data available were several hundred clinical claims, a very small dataset from an AI perspective. Mengdi’s algorithm was still able to improve on previous strategies, but not by as much as she thinks it could have with better data. Mengdi’s group is working on methodology to help improve algorithms’ success when faced with such data limitations.

“I think the real impact will be when we make reinforcement learning work on real systems that cannot be simulated like computer games, so that we can solve practical problems much better,” Mengdi says.

As Mengdi works on advancing the methodology in her field, she also celebrates advances in the inclusivity of the field. When she was starting out, there were few women faculty members whose example she could follow. Female students getting into the field now have many examples — Mengdi included — to show them that success is possible.

“In academics there is this problem of gender bias and gender imbalance. Machine learning, perhaps because it is a newer field, I think is more diverse, and it’s been exciting to see that diversity growing in recent years,” Mengdi says. “I work with amazingly good female students and researchers, and I cannot wait to see what the younger generation achieves.”

For young researchers and future algorithms alike, Mengdi believes that the sky is the limit.