Learning Optimal and Fair Decision Trees for Non-Discriminative Decision-Making
Authors: Sina Aghaei, Mohammad Javad Azizi, Phebe Vayanos1418-1426
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive computational studies that show that our framework improves the state-of-the-art in the field (which typically relies on heuristics) to yield non-discriminative decisions at lower cost to overall accuracy. We evaluate our approach on 3 datasets: (A) The Default dataset of Taiwanese credit card users (Dheeru and Karra Taniskidou 2017; Yeh and Lien 2009) with |N| = 30, 000 and d = 23 features, where we predict whether individuals will default and the protected attribute is gender; (B) The Adult dataset (Dheeru and Karra Taniskidou 2017; Kohavi 1996) with |N| = 45, 000, d = 13, where we predict if an individual earns more than $50k per year and the protected attribute is race; (C) The COMPAS dataset (Angwin et al. 2016; Corbett-Davies et al. 2017) with |N| = 10, 500 data points and d = 16, where we predict if a convicted individual will commit a violent crime and the protected attribute is race. |
| Researcher Affiliation | Academia | Sina Aghaei, Mohammad Javad Azizi, Phebe Vayanos CAIS Center for Artificial Intelligence in Society University of Southern California, Los Angeles, CA 90007 {saghaei,azizim,phebe.vayanos}@usc.edu |
| Pseudocode | No | The paper provides a mathematical formulation (MILP) but does not contain a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper. |
| Open Datasets | Yes | We evaluate our approach on 3 datasets: (A) The Default dataset of Taiwanese credit card users (Dheeru and Karra Taniskidou 2017; Yeh and Lien 2009)... (B) The Adult dataset (Dheeru and Karra Taniskidou 2017; Kohavi 1996)... (C) The COMPAS dataset (Angwin et al. 2016; Corbett-Davies et al. 2017)... We evaluate our approach on the Crime dataset (Dheeru and Karra Taniskidou 2017; Redmond and Baveja 2002)... |
| Dataset Splits | Yes | We do k-fold cross validation where for classification (regression) k is 5(4). |
| Hardware Specification | Yes | We modeled the MIP using Ju MP in Julia (Dunning, Huchette, and Lubin 2017) and solved it using Gurobi 7.5.2 on a computer node with 20 CPUs and 64 GB of RAM. |
| Software Dependencies | Yes | We modeled the MIP using Ju MP in Julia (Dunning, Huchette, and Lubin 2017) and solved it using Gurobi 7.5.2 on a computer node with 20 CPUs and 64 GB of RAM. |
| Experiment Setup | Yes | For each (fold, approach) pair, we select the optimal λ (call it λ ) in the objective (6) as follows: for each λ in {0, 0.1, 0.2, . . .}, we compute the tree on the fold using the given approach and determine the associated discrimination level on the fold; we stop when the discrimination level is < 0.01% and return λ as λ ; we then evaluate accuracy (misclassification rate/MAE) and discrimination of the classification/regression tree associated with λ on the test set and add this as a point in the corresponding graph in Figure 1. We imposed a 5 (10) hour solve time limit for classification (regression). |