HMI DAIS 12 - The Importance of Modeling Data Missingness in Algorithmic Fairness
HMI DAIS 12 - Public online seminar, 9am 26 November 2020 AEST
Naman Goel (Swiss Federal Institute of Technology) gave the twelfth HMI Data, AI and Society public seminar.
Naman Goel is a postdoctoral researcher at the Swiss Federal Institute of Technology in Zurich (ETHZ). Before joining ETHZ, he completed his Ph.D. at the Swiss Federal Institute of Technology in Lausanne (EPFL), where his thesis is nominated for thesis distinction. He completed his undergraduate (and integrated master's) studies at the Indian Institute of Technology (IIT), Varanasi, where he was awarded the institute medal for standing first in his class. He has published several research papers at premier AI conferences like AAAI, IJCAI, UAI and AIES. He (with his co-authors) received the ACM Web Science 2020 Best Paper Award.
Title: The Importance of Modeling Data Missingness in Algorithmic Fairness
Abstract: Training datasets for machine learning often have some form of missingness. For example, to learn a model for deciding whom to give a loan, the available training data includes individuals who were given a loan in the past, but not those who were not. This missingness, if ignored, nullifies any fairness guarantee of the training procedure when the model is deployed. Using causal graphs, we characterize the missingness mechanisms in different real-world scenarios. We show conditions under which various distributions, used in popular fairness algorithms, can or can not be recovered from the training data. Our theoretical results imply that many of these algorithms can not guarantee fairness in practice. Modeling missingness also helps to identify correct design principles for fair algorithms. For example, in multi-stage settings where decisions are made in multiple screening rounds, we use our framework to derive the minimal distributions required to design a fair algorithm. Our proposed algorithm also decentralizes the decision-making process and still achieves similar performance to the optimal algorithm that requires centralization and non-recoverable distributions.
HMI Dais recordings can be viewed here.