Algorithms and decision making based on Big Data have become pervasive in all aspects of our daily (offline and online) lives, as they have become essential tools in personal finance, health care, hiring, housing, education, and policies. Data and algorithms determine the media we consume, the stories we read, the people we meet, the places we visit, but also whether we get a job, or whether our loan request is approved. It is therefore of societal and ethical importance to ask whether these algorithms can be discriminative on grounds, such as gender, ethnicity, marital or health status. It turns out that the answer is positive: for instance, recent studies have shown that Google’s online advertising system displayed ads for high-income jobs to men much more often than it did to women; and ads for arrest records were significantly more likely to show up on searches for distinctively black names or a historically black fraternity.
This algorithmic bias exists even when there is no discrimination intention in the developer of the algorithm. Sometimes it may be inherent to the data sources used (software making decisions based on data can reflect, or even amplify, the results of historical discrimination), but even when the sensitive attributes have been suppressed from the input, a well trained machine learning algorithm may still discriminate on the basis of such sensitive attributes because of correlations existing in the data.
From technical point of view, efforts at fighting algorithmic bias have led to developing two groups of solutions: (1) techniques for discrimination discovery from data and (2) discrimination prevention by means of fairness-aware data mining, develop data mining systems which are discrimination-conscious by-design
To address the first problem, recent studies take a principled causal approach to the data mining problem of discrimination detection in databases. Researchers define a method to extract, from a dataset of historical decision records, the causal structures existing among the attributes in the data. The result is a type of constrained Bayesian network, which they dub Suppes-Bayes Causal Network (SBCN). Next, they develop a toolkit of measurement methods based on random walks on top of the SBCN, addressing different anti-discrimination legal concepts, such as direct and indirect discrimination, group and individual discrimination, genuine requirement, and favoritism.
Resources from the Fairness, Accountability and Transparency in Machine Learning workshop present the recent studies to address both above problems and consider the role that machines play in consequential decisions in areas like employment, health care and policing.
This is an emerging research area with plenty of open questions, and in dire need of theoretical results as well as practical tools, for researchers and practitioners.