Learning Optimal and Fair Decision Trees for Non-Discriminative Decision-Making

Authors: Sina Aghaei, Mohammad Javad Azizi, Phebe Vayanos1418-1426

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive computational studies that show that our framework improves the state-of-the-art in the field (which typically relies on heuristics) to yield non-discriminative decisions at lower cost to overall accuracy. We evaluate our approach on 3 datasets: (A) The Default dataset of Taiwanese credit card users (Dheeru and Karra Taniskidou 2017; Yeh and Lien 2009) with |N| = 30, 000 and d = 23 features, where we predict whether individuals will default and the protected attribute is gender; (B) The Adult dataset (Dheeru and Karra Taniskidou 2017; Kohavi 1996) with |N| = 45, 000, d = 13, where we predict if an individual earns more than $50k per year and the protected attribute is race; (C) The COMPAS dataset (Angwin et al. 2016; Corbett-Davies et al. 2017) with |N| = 10, 500 data points and d = 16, where we predict if a convicted individual will commit a violent crime and the protected attribute is race.
Researcher Affiliation Academia Sina Aghaei, Mohammad Javad Azizi, Phebe Vayanos CAIS Center for Artificial Intelligence in Society University of Southern California, Los Angeles, CA 90007 {saghaei,azizim,phebe.vayanos}@usc.edu
Pseudocode No The paper provides a mathematical formulation (MILP) but does not contain a structured pseudocode or algorithm block.
Open Source Code No The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper.
Open Datasets Yes We evaluate our approach on 3 datasets: (A) The Default dataset of Taiwanese credit card users (Dheeru and Karra Taniskidou 2017; Yeh and Lien 2009)... (B) The Adult dataset (Dheeru and Karra Taniskidou 2017; Kohavi 1996)... (C) The COMPAS dataset (Angwin et al. 2016; Corbett-Davies et al. 2017)... We evaluate our approach on the Crime dataset (Dheeru and Karra Taniskidou 2017; Redmond and Baveja 2002)...
Dataset Splits Yes We do k-fold cross validation where for classification (regression) k is 5(4).
Hardware Specification Yes We modeled the MIP using Ju MP in Julia (Dunning, Huchette, and Lubin 2017) and solved it using Gurobi 7.5.2 on a computer node with 20 CPUs and 64 GB of RAM.
Software Dependencies Yes We modeled the MIP using Ju MP in Julia (Dunning, Huchette, and Lubin 2017) and solved it using Gurobi 7.5.2 on a computer node with 20 CPUs and 64 GB of RAM.
Experiment Setup Yes For each (fold, approach) pair, we select the optimal λ (call it λ ) in the objective (6) as follows: for each λ in {0, 0.1, 0.2, . . .}, we compute the tree on the fold using the given approach and determine the associated discrimination level on the fold; we stop when the discrimination level is < 0.01% and return λ as λ ; we then evaluate accuracy (misclassification rate/MAE) and discrimination of the classification/regression tree associated with λ on the test set and add this as a point in the corresponding graph in Figure 1. We imposed a 5 (10) hour solve time limit for classification (regression).