reproducibilityindex.ai

Fair Learning with Private Demographic Data

Authors: Hussein Mozannar, Mesrob Ohannessian, Nathan Srebro

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5.3. Experimental Illustration Data. We use the adult income data set (Kohavi, 1996) containing 48,842 examples. The task is to predict whether a person s income is higher than $50k. Each data point has 14 features including education and occupation, the protected attribute A we use is gender: male or female. Approach. We use a logistic regression model for classiﬁcation. For the reductions approach, we use the implementation in the fairlearn package 1. We set T = 50, η = 2.0 and B = 100 for all experiments. We split the data into 75% for training and 25% for testing. We repeat the splitting over 10 trials. Effect of privacy. We plot in Figure 1 the resulting discrimination violation and model accuracy against increasing privacy levels ϵ for the predictor ˆY resulting from step 1 , trained on all the training data, and the two-step predictor e Y trained on S1 and S2. We observe that e Y achieves lower discrimination than ˆY across the different privacy levels. This comes at a cost of lower accuracy, which improves at lower privacy regimes (large epsilon). The predictor of step 1 only begins to suffer on accuracy when the fairness constraint is void at high levels of privacy (small epsilon).
Researcher Affiliation	Academia	1IDSS, Massachusetts Institute of Technology, MA, USA 2Department of Electrical and Computer Engineering, University of Illinois at Chicago, IL, USA 3Toyota Technological Institute, IL, USA.
Pseudocode	Yes	Algorithm 1 Exp. gradient reduction for fair classiﬁcation (Agarwal et al., 2018)
Open Source Code	Yes	Code to reproduce Figure 1 is publicly available 2. 2https://github.com/husseinmozannar/ fairlearn_private_data
Open Datasets	Yes	Data. We use the adult income data set (Kohavi, 1996) containing 48,842 examples.
Dataset Splits	Yes	We split the data into 75% for training and 25% for testing.
Hardware Specification	No	The paper does not specify any hardware details such as CPU, GPU models, or memory used for the experiments.
Software Dependencies	No	The paper mentions using the "fairlearn package" but does not provide its version number or any other specific software dependencies with versions.
Experiment Setup	Yes	We set T = 50, η = 2.0 and B = 100 for all experiments. We split the data into 75% for training and 25% for testing. We repeat the splitting over 10 trials.