LIDS/ALL 2017

Most people don’t give much thought to the ways in which an e-commerce platform like Amazon generates its recommendations for users. Christina Lee considers this topic often, however.

As part of her research into statistical algorithms, she thinks about how to design them to make the best use of the social data these platforms gather. Amazon, as well as other platforms like Etsy and eBay, use the humangenerated data they collect (from each user's browsing and purchasing decisions) to encourage those users to view and buy other items. Christina looks at the scalable statistical algorithms behind all of this.

It was during her undergraduate work in computer science at the California Institute of Technology that Christina first became interested in research. "My mentor in undergrad was Adam Wierman, and he was a very energetic and motivating professor. Everything he talked about sounded really exciting, and he taught a course about network science, which includes modeling how networks grow and form, what properties arise from different models, and how they affect interactions over a network." When she graduated in 2011 and it was time to choose a graduate program, she came to MIT for a PhD in Electrical Engineering and Computer Science. She sought out LIDS specifically for its interdisciplinary approach to research. "I felt like it was a very unique lab that had both rigorous theory and a lot of interesting applications," she says. "I definitely came to work with my current advisors, Asu Ozdaglar and Devavrat Shah, as well. They both work on research related to networks but with very different approaches and perspectives, which is really unique."

Christina's recent research has centered around collaborative filtering, which is basically a way for a system to make automatic predictions about its users' interests. It does this by gathering and comparing the interests of as many of its users as possible in order to find other similar users. "It's a really simple, intuitive idea," says Christina. "This idea of collaborative filtering has been used already in many applications, where you discover similarities between users or between products, and these similarities allow you to then aggregate the information to better predict what the user's preference might be. But although they've been used a lot in the past, there hasn't been much theoretical foundation to explain why this seems to perform well in practice." This is where her research comes in. Her goal is twofold: to find the theory behind why this method would work for a general model of the data, and to find the necessary assumptions for the data to prove that it works. After that, it's possible to improve upon the algorithm using the underpinning of theory rather than guesswork.

The other crucial aspect to the recommendations made by e-commerce systems is density of data. If the data is too scarce, it's difficult to use it to make any reasonably accurate comparisons. However, Christina says, "We have been able to design new algorithms that are able to handle sparser data sets. The intuitive idea there is that if a pair of users do not have any commonly rated products, you would look at not only first order information but also higher order information associated with the users. For instance, not just the products the pair of users directly rated, but also the set of users who subsequently rated those products and the set of products that this larger set of users rated. So you would look at information that was associated with a user indirectly through a path in the data." Building these data paths allows for ferreting out similarities even when there are few or no shared ratings. This is especially useful for a platform such as eBay or Etsy, in which purchases tend to be more individualized because each item is sold to a limited number of users, making links between users harder to find.

The latest project at the forefront of Christina's mind is a little bit different. She's exploring the interplay between the machine learning and statistical algorithms used to generate recommendations, and a human behavioral model that tries to account for how people make decisions and react to recommendations. "Consider a user who visits the system in search of a product," she says. "The system makes a recommendation that might come alongside some information such as previous ratings for this product and historical information about other users' actions. The user will react to this information and make a decision: should I buy it or not, and if I buy it how do I rate it? Through these decisions, information is sent back into the system. This data is then collected and used by the system to predict user preferences and make future recommendations." One could imagine a worst-case scenario where these feedback loops in the data flow cause product ratings to be either positively or negatively overhyped. Ideally, a recommendation system should be designed in such a way to incorporate the human decision factor in how it analyzes the data it receives-to make it easier for users to find products that match their preferences, and learn the product qualities as accurately as possible. This is not a simple thing to do in practice, of course, because human behavior is hard to model.

In the future, Christina hopes to explore other applications for this type of collaborative filtering, such as those in the healthcare industry. She's already considered the possibilities related to projects like predicting protein or gene interactions: "You can run experiments to test the interactions of proteins, but it might be really expensive to test every single pair of proteins. So they will often select a small set of experiments to run. Given that you have incomplete information about this network, the question becomes, can I use this incomplete information to predict interactions between untested pairs of proteins? What's the next experiment I should do that will hopefully be more useful?" Data sparsity is frequently an issue in this application, so it's not difficult to see how her research could play a role in addressing that. But she's interested in a variety of different paths after LIDS, from healthcare to incorporating human behavior and game theory in understanding recommendation systems, to continuing her current work on understanding statistical algorithms for general models.

Christina says the intellectual foundation she's built in her time at LIDS has been as fundamental to her work as the support and assistance she's received from advisors, staff, and other students there. "It's not just about getting an algorithm to work for a particular data set and getting cool plots or charts. We really want to understand, why is this working, and what are the things that would break it?" she says. "At the same time, with my advisors, even though we care a lot about theory, they're still always asking the question of what is the impact and why do we care about this? We're not only designing algorithms that we can analyze and get nice theory results, but are also always asking: Is this algorithm practical in terms of implementation, is it scalable to our data sets? I really appreciate that aspect, looking for simple, elegant solutions that we are able to explain clearly." The open, welcoming community she has found among the other female students in particular, as well as the unwavering support and encouragement of her advisors, has helped her feel at home and inspired.

When she's not busy in the lab, Christina takes part in activities with her church and loves cooking and sharing meals with friends. Even in the realm of food, her passion for discovery stands out. "I like grocery shopping," she says with a laugh. "You can think about the possibilities."