Aleatoric and Epistemic Discrimination: Fundamental Limits of Fairness Interventions
Authors: Hao Wang, Luxi He, Rui Gao, Flavio Calmon
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our numerical experiments are semi-synthetic since we apply fairness interventions to train classifiers using the entire dataset and resample from it as the test set. This setup enables us to eliminate the estimation error associated with Algorithm 1 (see Appendix E for a discussion). |
| Researcher Affiliation | Collaboration | Hao Wang MIT-IBM Watson AI Lab hao@ibm.com Luxi (Lucy) He Harvard College luxihe@college.harvard.edu Rui Gao The University of Texas at Austin rui.gao@mccombs.utexas.edu Flavio P. Calmon Harvard University flavio@seas.harvard.edu |
| Pseudocode | Yes | Algorithm 1 Approximate the fairness Pareto frontier. |
| Open Source Code | No | The paper references third-party libraries and code for benchmark methods (e.g., IBM AIF360 library, Python implementations from a Github repo), but it does not state that the code for its *own* methodology is open-source or provide a link. |
| Open Datasets | Yes | We evaluate our results on the UCI Adult dataset (Bache and Lichman, 2013), the Pro Publica COMPAS dataset (Angwin et al., 2016), the German Credit dataset (Bache and Lichman, 2013), and HSLS (High School Longitudinal Study) dataset (Ingels et al., 2011; Jeong et al., 2022). |
| Dataset Splits | No | The paper mentions training classifiers using the entire dataset and resampling for testing, but does not provide specific train/validation/test splits, percentages, or absolute counts required for reproduction. It states 'resample from it as the test set' without further detail on how the training data is managed for validation. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running its experiments. |
| Software Dependencies | No | The paper mentions 'IBM AIF360 library', 'Python implementations from the Github repo in Alghamdi et al. (2022)', and 'Scikit-learn (Pedregosa et al., 2011)'. While these are software components, specific version numbers for these libraries are not provided in the text, which is required for a reproducible description. |
| Experiment Setup | Yes | We run Algorithm 1 with k = 6 pieces, 20 iterations, and varying αEO to estimate Fair Front on each dataset. We compute the expectations and the g function from the empirical distributions and solve the DC program by using the package in Shen et al. (2016). ... For the Adult dataset, we use Random Forest with n_estimators=15, min_samples_leaf=3, criterion = log_loss, bootstrap = False as our baseline classifier; for the COMPAS dataset, we use Random Forest with n_estimators = 17 as our baseline classifier. For the German Credit dataset, we use Random Forest with n_estimators=100,min_samples_split =2,min_samples_leaf=1 as our baseline classifier. |