Fair Learning with Private Demographic Data

Authors: Hussein Mozannar, Mesrob Ohannessian, Nathan Srebro

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5.3. Experimental Illustration Data. We use the adult income data set (Kohavi, 1996) containing 48,842 examples. The task is to predict whether a person s income is higher than $50k. Each data point has 14 features including education and occupation, the protected attribute A we use is gender: male or female. Approach. We use a logistic regression model for classification. For the reductions approach, we use the implementation in the fairlearn package 1. We set T = 50, η = 2.0 and B = 100 for all experiments. We split the data into 75% for training and 25% for testing. We repeat the splitting over 10 trials. Effect of privacy. We plot in Figure 1 the resulting discrimination violation and model accuracy against increasing privacy levels ϵ for the predictor ˆY resulting from step 1 , trained on all the training data, and the two-step predictor e Y trained on S1 and S2. We observe that e Y achieves lower discrimination than ˆY across the different privacy levels. This comes at a cost of lower accuracy, which improves at lower privacy regimes (large epsilon). The predictor of step 1 only begins to suffer on accuracy when the fairness constraint is void at high levels of privacy (small epsilon).
Researcher Affiliation Academia 1IDSS, Massachusetts Institute of Technology, MA, USA 2Department of Electrical and Computer Engineering, University of Illinois at Chicago, IL, USA 3Toyota Technological Institute, IL, USA.
Pseudocode Yes Algorithm 1 Exp. gradient reduction for fair classification (Agarwal et al., 2018)
Open Source Code Yes Code to reproduce Figure 1 is publicly available 2. 2https://github.com/husseinmozannar/ fairlearn_private_data
Open Datasets Yes Data. We use the adult income data set (Kohavi, 1996) containing 48,842 examples.
Dataset Splits Yes We split the data into 75% for training and 25% for testing.
Hardware Specification No The paper does not specify any hardware details such as CPU, GPU models, or memory used for the experiments.
Software Dependencies No The paper mentions using the "fairlearn package" but does not provide its version number or any other specific software dependencies with versions.
Experiment Setup Yes We set T = 50, η = 2.0 and B = 100 for all experiments. We split the data into 75% for training and 25% for testing. We repeat the splitting over 10 trials.