Jennifer Wortman Vaughan


Jenn Wortman Vaughan is a Senior Researcher at Microsoft Research, New York City. Her research background is in machine learning and algorithmic economics. She is especially interested in the interaction between people and AI, and has often studied this interaction in the context of prediction markets and other crowdsourcing systems. In recent years, she has been focusing on fair and interpretable machine learning. Jenn came to MSR in 2012 from UCLA, where she was an assistant professor in the computer science department. She completed her Ph.D. at the University of Pennsylvania in 2009, and subsequently spent a year as a Computing Innovation Fellow at Harvard. She is the recipient of Penn’s 2009 Rubinoff dissertation award for innovative applications of computer technology, a National Science Foundation CAREER award, a Presidential Early Career Award for Scientists and Engineers (PECASE), and a handful of best paper awards. In her “spare” time, Jenn is involved in a variety of efforts to provide support for women in computer science; most notably, she co-founded the Annual Workshop for Women in Machine Learning, which has been held each year since 2006.

Talk: Why is fair machine learning hard and how can theory help?

The potential for machine learning systems to amplify social inequities and unfairness is receiving increasing attention in industry, academia, and the popular press.  In this talk, I will explore ways that theoretical modeling can be used as a tool to understand and address the challenges faced in making machine learning systems more fair, along with some pitfalls to avoid.

In the first part of the talk, I’ll discuss two projects in which we use mathematical models to provide insight into potential sources of unfairness in machine learning systems.  The first source of unfairness stems from different populations’ differing abilities to strategically manipulate the way that they appear in order to receive a better classification.  For example, if students know that SAT scores impact college admissions decisions, those who have the means to do so will artificially boost their scores by taking SAT prep courses or hiring tutors.  Our game theoretic analysis shows how the relative advantage of privileged groups can be perpetuated in settings like this and that this problem is not so easy to fix. For example, coming back to the college admissions example, we show that providing subsidies on SAT test prep courses to disadvantaged groups can have the surprising and counterintuitive effect of making those students even worse off since it allows the bar for admissions to be set higher.

The second source of unfairness we explore stems from the need of online learning algorithms, which are widely used to power search and content optimization on the web, to explore unknown actions, potentially sacrificing the experience of current users for information that will lead to better decisions in the future.  We ask whether the process of exploration itself can lead to unfairness, placing too much burden on certain individuals or groups.  Specifically, we initiate the study of the “externalities of exploration,” the undesirable side effects that the presence of one party may impose on another, under the linear contextual bandits model.  We show that the very presence of one group can negatively impact another group in unpredictable ways, and that in a precise sense, no algorithm can avoid it.

In the second part, I’ll discuss opportunities for machine learning theory (and mathematical models more generally) to have positive impact on fairness in practice.  I’ll draw on results from the first systematic investigation of commercial product teams’ challenges and needs for support in developing fairer ML systems.  By conducting and analyzing data collected from 35 semi-structured interviews and an anonymous survey of 267 machine learning practitioners, we identify areas of alignment and disconnect between the challenges faced by teams in practice and the solutions proposed in the fair machine learning research literature. Based on these findings, I will highlight directions for future research to better address industry practitioners’ needs.