reproducibilityindex.ai

Ensuring Fairness Beyond the Training Data

Authors: Debmalya Mandal, Samuel Deng, Suman Jana, Jeannette Wing, Daniel J. Hsu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on standard machine learning fairness datasets suggest that, compared to the state-of-the-art fair classiﬁers, our classiﬁer retains fairness guarantees and test accuracy for a large class of perturbations on the test set. Furthermore, our experiments show that there is an inherent trade-off between fairness robustness and accuracy of such classiﬁers.
Researcher Affiliation	Academia	Debmalya Mandal dm3557@columbia.edu Columbia University Samuel Deng sd3013@columbia.edu Columbia University Suman Jana suman@cs.columbia.edu Columbia University Jeannette M. Wing wing@columbia.edu Columbia University Daniel Hsu djhsu@cs.columbia.edu Columbia University
Pseudocode	Yes	ALGORITHM 1: Meta-Algorithm, ALGORITHM 2: Best Response of the -player, ALGORITHM 3: Approximate Fair Classi er (Apx Fair)
Open Source Code	Yes	Our code is available at this Git Hub repo: https://github.com/essdeee/ Ensuring-Fairness-Beyond-the-Training-Data.
Open Datasets	Yes	We used the following four datasets for our experiments. Adult. In this dataset [24], each example represents an adult individual... Communities and Crime. In this dataset from the UCI repository [29]... Law School. We used a preprocessed and balanced subset with 1,823 examples and 17 features [33]. COMPAS. We used a 2,000 example sample from the full dataset. For Adult, Communities and Crime, and Law School we used the preprocessed versions found in the accompanying Git Hub repo of [22]4. For COMPAS, we used a sample from the original dataset [1].
Dataset Splits	Yes	In order to evaluate different fair classi ers, we rst split each dataset into ve different random 80%-20% train-test splits. Then, we split each training set further into a 80%-20% train and validation sets. Therefore, there were ve random sets of 64%-16%-20% train-validation-test split.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, cloud instances) used for running experiments were mentioned in the paper.
Software Dependencies	No	The paper mentions using 'scikit-learn’s logistic regression [27]' but does not provide specific version numbers for scikit-learn or other software dependencies.
Experiment Setup	Yes	To nd the correct hyper-parameters (B, , T, and Tm) for our algorithm, we xed T = 10 for EO, and T = 5 for DP, and used grid search for the hyper-parameters B, , and Tm. The tested values were {0.1, 0, 2, . . . , 1} for B, {0, 0.05, . . . , 1} for , and {100, 200, . . . , 2000} for Tm.