reproducibilityindex.ai

Individual Arbitrariness and Group Fairness

Authors: Carol Long, Hsiang Hsu, Wael Alghamdi, Flavio Calmon

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present empirical results to show that arbitrariness is masked by favorable group-fairness and accuracy metrics for multiple fairness intervention methods, baseline models, and datasets 7. We also demonstrate the effectiveness of the ensemble in reducing the predictive multiplicity of fair models.
Researcher Affiliation	Academia	John A. Paulson School of Engineering and Applied Sciences, Harvard University, Boston, MA 02134. Emails: carol_long@g.harvard.edu, alghamdi@g.harvard.edu, flavio@seas.harvard.edu.
Pseudocode	No	The paper describes methods in paragraph text and does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code can be found at https://github.com/Carol-Long/Fairness_and_Arbitrariness
Open Datasets	Yes	We report predictive multiplicity and benchmark the ensemble method on three datasets two datasets in the education domain: the high-school longitudinal study (HSLS) dataset [27, 28] and the ENEM dataset [16] (see Alghamdi et al. [2] Appendix B.1), and the UCI Adult dataset[33] which is based on the US census income data.
Dataset Splits	Yes	First, split the data into training, validation, and test dataset. ... We use the validation set to measure \epsilon corresponding to this empirical Rashomon Set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies	No	The paper mentions software like Scikit-learn, AIF360 toolkits, and PANDAS package, but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	For logistic regression and gradient boosting, the default hyperparameter is used; for random forest, we set the number of trees and minimum number of samples per leaf to 10 to prevent over-fitting. To get 10 competing models for each hypothesis class, we use 10 random seeds (specifically 33 42).