reproducibilityindex.ai

Learning Optimal and Fair Decision Trees for Non-Discriminative Decision-Making

Authors: Sina Aghaei, Mohammad Javad Azizi, Phebe Vayanos1418-1426

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive computational studies that show that our framework improves the state-of-the-art in the ﬁeld (which typically relies on heuristics) to yield non-discriminative decisions at lower cost to overall accuracy. We evaluate our approach on 3 datasets: (A) The Default dataset of Taiwanese credit card users (Dheeru and Karra Taniskidou 2017; Yeh and Lien 2009) with \|N\| = 30, 000 and d = 23 features, where we predict whether individuals will default and the protected attribute is gender; (B) The Adult dataset (Dheeru and Karra Taniskidou 2017; Kohavi 1996) with \|N\| = 45, 000, d = 13, where we predict if an individual earns more than $50k per year and the protected attribute is race; (C) The COMPAS dataset (Angwin et al. 2016; Corbett-Davies et al. 2017) with \|N\| = 10, 500 data points and d = 16, where we predict if a convicted individual will commit a violent crime and the protected attribute is race.
Researcher Affiliation	Academia	Sina Aghaei, Mohammad Javad Azizi, Phebe Vayanos CAIS Center for Artiﬁcial Intelligence in Society University of Southern California, Los Angeles, CA 90007 {saghaei,azizim,phebe.vayanos}@usc.edu
Pseudocode	No	The paper provides a mathematical formulation (MILP) but does not contain a structured pseudocode or algorithm block.
Open Source Code	No	The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper.
Open Datasets	Yes	We evaluate our approach on 3 datasets: (A) The Default dataset of Taiwanese credit card users (Dheeru and Karra Taniskidou 2017; Yeh and Lien 2009)... (B) The Adult dataset (Dheeru and Karra Taniskidou 2017; Kohavi 1996)... (C) The COMPAS dataset (Angwin et al. 2016; Corbett-Davies et al. 2017)... We evaluate our approach on the Crime dataset (Dheeru and Karra Taniskidou 2017; Redmond and Baveja 2002)...
Dataset Splits	Yes	We do k-fold cross validation where for classiﬁcation (regression) k is 5(4).
Hardware Specification	Yes	We modeled the MIP using Ju MP in Julia (Dunning, Huchette, and Lubin 2017) and solved it using Gurobi 7.5.2 on a computer node with 20 CPUs and 64 GB of RAM.
Software Dependencies	Yes	We modeled the MIP using Ju MP in Julia (Dunning, Huchette, and Lubin 2017) and solved it using Gurobi 7.5.2 on a computer node with 20 CPUs and 64 GB of RAM.
Experiment Setup	Yes	For each (fold, approach) pair, we select the optimal λ (call it λ ) in the objective (6) as follows: for each λ in {0, 0.1, 0.2, . . .}, we compute the tree on the fold using the given approach and determine the associated discrimination level on the fold; we stop when the discrimination level is < 0.01% and return λ as λ ; we then evaluate accuracy (misclassiﬁcation rate/MAE) and discrimination of the classiﬁcation/regression tree associated with λ on the test set and add this as a point in the corresponding graph in Figure 1. We imposed a 5 (10) hour solve time limit for classiﬁcation (regression).