Fair Learning with Private Demographic Data
Authors: Hussein Mozannar, Mesrob Ohannessian, Nathan Srebro
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5.3. Experimental Illustration Data. We use the adult income data set (Kohavi, 1996) containing 48,842 examples. The task is to predict whether a person s income is higher than $50k. Each data point has 14 features including education and occupation, the protected attribute A we use is gender: male or female. Approach. We use a logistic regression model for classification. For the reductions approach, we use the implementation in the fairlearn package 1. We set T = 50, η = 2.0 and B = 100 for all experiments. We split the data into 75% for training and 25% for testing. We repeat the splitting over 10 trials. Effect of privacy. We plot in Figure 1 the resulting discrimination violation and model accuracy against increasing privacy levels ϵ for the predictor ˆY resulting from step 1 , trained on all the training data, and the two-step predictor e Y trained on S1 and S2. We observe that e Y achieves lower discrimination than ˆY across the different privacy levels. This comes at a cost of lower accuracy, which improves at lower privacy regimes (large epsilon). The predictor of step 1 only begins to suffer on accuracy when the fairness constraint is void at high levels of privacy (small epsilon). |
| Researcher Affiliation | Academia | 1IDSS, Massachusetts Institute of Technology, MA, USA 2Department of Electrical and Computer Engineering, University of Illinois at Chicago, IL, USA 3Toyota Technological Institute, IL, USA. |
| Pseudocode | Yes | Algorithm 1 Exp. gradient reduction for fair classification (Agarwal et al., 2018) |
| Open Source Code | Yes | Code to reproduce Figure 1 is publicly available 2. 2https://github.com/husseinmozannar/ fairlearn_private_data |
| Open Datasets | Yes | Data. We use the adult income data set (Kohavi, 1996) containing 48,842 examples. |
| Dataset Splits | Yes | We split the data into 75% for training and 25% for testing. |
| Hardware Specification | No | The paper does not specify any hardware details such as CPU, GPU models, or memory used for the experiments. |
| Software Dependencies | No | The paper mentions using the "fairlearn package" but does not provide its version number or any other specific software dependencies with versions. |
| Experiment Setup | Yes | We set T = 50, η = 2.0 and B = 100 for all experiments. We split the data into 75% for training and 25% for testing. We repeat the splitting over 10 trials. |