Individual Arbitrariness and Group Fairness

Authors: Carol Long, Hsiang Hsu, Wael Alghamdi, Flavio Calmon

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present empirical results to show that arbitrariness is masked by favorable group-fairness and accuracy metrics for multiple fairness intervention methods, baseline models, and datasets 7. We also demonstrate the effectiveness of the ensemble in reducing the predictive multiplicity of fair models.
Researcher Affiliation Academia John A. Paulson School of Engineering and Applied Sciences, Harvard University, Boston, MA 02134. Emails: carol_long@g.harvard.edu, alghamdi@g.harvard.edu, flavio@seas.harvard.edu.
Pseudocode No The paper describes methods in paragraph text and does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Code can be found at https://github.com/Carol-Long/Fairness_and_Arbitrariness
Open Datasets Yes We report predictive multiplicity and benchmark the ensemble method on three datasets two datasets in the education domain: the high-school longitudinal study (HSLS) dataset [27, 28] and the ENEM dataset [16] (see Alghamdi et al. [2] Appendix B.1), and the UCI Adult dataset[33] which is based on the US census income data.
Dataset Splits Yes First, split the data into training, validation, and test dataset. ... We use the validation set to measure \epsilon corresponding to this empirical Rashomon Set.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper mentions software like Scikit-learn, AIF360 toolkits, and PANDAS package, but does not provide specific version numbers for these dependencies.
Experiment Setup Yes For logistic regression and gradient boosting, the default hyperparameter is used; for random forest, we set the number of trees and minimum number of samples per leaf to 10 to prevent over-fitting. To get 10 competing models for each hypothesis class, we use 10 random seeds (specifically 33 42).