TYPES
Towards transparency and privacy in the online advertising business

Exposing the probabilistic causal structure of discrimination

Authors: Francesco Bonchi, Sara Hajian, Bud Mishra, Daniele Ramazzotti

Partners involved: EURECAT

Published in: Int J Data Sci Anal (2017) 3: 1

DOI: 10.1007/s41060-016-0040-z

Abstract: Discrimination discovery from data is an important data mining task, whose goal is to identify patterns of illegal and unethical discriminatory activities against protected-by-law groups, e.g., ethnic minorities. While any legally valid proof of discrimination requires evidence of causality, the state-of-the-art methods are essentially correlation based, albeit, as it is well known, correlation does not imply causation. In this paper, we take a principled causal approach to discrimination detection following Suppes’ probabilistic causation theory. In particular, we define a method to extract, from a dataset of historical decision records, the causal structures existing among the attributes in the data. The result is a type of constrained Bayesian network, which we dub Suppes-Bayes causal network (SBCN). Next, we develop a toolkit of methods based on random walks on top of the SBCN, addressing different anti-discrimination legal concepts, such as direct and indirect discrimination, group and individual discrimination, genuine requirement, and favoritism. Our experiments on real-world datasets confirm the inferential power of our approach in all these different tasks.

Research highlights:

  • Discrimination discovery from data is an important task aiming at identifying patterns of illegal and unethical discriminatory activities against protected-by-law groups, e.g., ethnic minorities.
  • While any legally-valid proof of discrimination requires evidence of causality, the state-of-the art methods are essentially correlation-based, albeit, as it is well known, correlation does not imply causation.
  • In this paper we take a principled causal approach to the data mining problem of discrimination detection in databases. Following Suppes’ probabilistic causation theory, we define a method to extract, from a dataset of historical decision records, the causal structures existing among the attributes in the data. The result is a type of constrained Bayesian network, which we dub Suppes-Bayes Causal Network (SBCN).
  • Next, we develop a toolkit of methods based on random walks on top of the SBCN, addressing different anti-discrimination legal concepts, such as direct and indirect discrimination, group and individual discrimination, genuine requirement, and favoritism.
  • Our experiments on real-world datasets confirm the inferential power of our approach in all these different tasks.

Read the entire paper here.

Leave a Reply