reproducibilityindex.ai

Aleatoric and Epistemic Discrimination: Fundamental Limits of Fairness Interventions

Authors: Hao Wang, Luxi He, Rui Gao, Flavio Calmon

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our numerical experiments are semi-synthetic since we apply fairness interventions to train classifiers using the entire dataset and resample from it as the test set. This setup enables us to eliminate the estimation error associated with Algorithm 1 (see Appendix E for a discussion).
Researcher Affiliation	Collaboration	Hao Wang MIT-IBM Watson AI Lab hao@ibm.com Luxi (Lucy) He Harvard College luxihe@college.harvard.edu Rui Gao The University of Texas at Austin rui.gao@mccombs.utexas.edu Flavio P. Calmon Harvard University flavio@seas.harvard.edu
Pseudocode	Yes	Algorithm 1 Approximate the fairness Pareto frontier.
Open Source Code	No	The paper references third-party libraries and code for benchmark methods (e.g., IBM AIF360 library, Python implementations from a Github repo), but it does not state that the code for its own methodology is open-source or provide a link.
Open Datasets	Yes	We evaluate our results on the UCI Adult dataset (Bache and Lichman, 2013), the Pro Publica COMPAS dataset (Angwin et al., 2016), the German Credit dataset (Bache and Lichman, 2013), and HSLS (High School Longitudinal Study) dataset (Ingels et al., 2011; Jeong et al., 2022).
Dataset Splits	No	The paper mentions training classifiers using the entire dataset and resampling for testing, but does not provide specific train/validation/test splits, percentages, or absolute counts required for reproduction. It states 'resample from it as the test set' without further detail on how the training data is managed for validation.
Hardware Specification	No	The paper does not explicitly describe the hardware used for running its experiments.
Software Dependencies	No	The paper mentions 'IBM AIF360 library', 'Python implementations from the Github repo in Alghamdi et al. (2022)', and 'Scikit-learn (Pedregosa et al., 2011)'. While these are software components, specific version numbers for these libraries are not provided in the text, which is required for a reproducible description.
Experiment Setup	Yes	We run Algorithm 1 with k = 6 pieces, 20 iterations, and varying αEO to estimate Fair Front on each dataset. We compute the expectations and the g function from the empirical distributions and solve the DC program by using the package in Shen et al. (2016). ... For the Adult dataset, we use Random Forest with n_estimators=15, min_samples_leaf=3, criterion = log_loss, bootstrap = False as our baseline classifier; for the COMPAS dataset, we use Random Forest with n_estimators = 17 as our baseline classifier. For the German Credit dataset, we use Random Forest with n_estimators=100,min_samples_split =2,min_samples_leaf=1 as our baseline classifier.