Differential Privacy has Bounded Impact on Fairness in Classification

Authors: Paul Mangold, Michaël Perrot, Aurélien Bellet, Marc Tommasi

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we numerically illustrate the upper bounds from Section 4.2. We use the celeb A (Liu et al., 2015) and folktables (Ding et al., 2021) datasets... In Table 1, we compute the value of Theorem 4.4 s bounds. We learn a non-private ℓ2-regularized logistic regression model, and use it to compute the bounds...
Researcher Affiliation Academia 1Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRISt AL, F-59000 Lille, France.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/ pmangold/fairness-privacy.
Open Datasets Yes We use the celeb A (Liu et al., 2015) and folktables (Ding et al., 2021) datasets... For each dataset, we use 90% of the records for training... The celeb A dataset... can be downloaded at http://mmlab.ie.cuhk. edu.hk/projects/Celeb A.html, and the folktables dataset... can be downloaded using a Python package available here https://github.com/zykls/folktables.
Dataset Splits No For each dataset, we use 90% of the records for training, and the remaining 10% for empirical evaluation of the bounds.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No On each dataset, for each value of n, we train a ℓ2-regularized logistic regression model using scikit-learn (Pedregosa et al., 2011).
Experiment Setup Yes We train ℓ2-regularized logistic regression models, ensuring that the underlying optimization problem is 1-strongly-convex. This allows learning private models by output perturbation, for which the bound from Theorem 4.4 holds. ... For each value of n and ϵ, we plot Theorem 4.4 s theoretical guarantees... For the plots with different number of training records, we train 20 non-private models with a number of records logarithmically spaced between 10 and the number of records in the complete training set... For the plots with different privacy budgets, we use 20 values logarithmically spaced between 10 3 and 10 for both datasets.