Differential Privacy Has Disparate Impact on Model Accuracy

Authors: Eugene Bagdasaryan, Omid Poursaeed, Vitaly Shmatikov

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate this effect for (1) gender classification already notorious for bias in the existing models [7] and age classification on facial images, where DP-SGD degrades accuracy for the darker-skinned faces more than for the lighter-skinned ones; (2) sentiment analysis of tweets, where DP-SGD disproportionately degrades accuracy for users writing in African-American English; (3) species classification on the i Naturalist dataset, where DP-SGD disproportionately degrades accuracy for the underrepresented classes; and (4) federated learning of language models, where DP-SGD disproportionately degrades accuracy for users with bigger vocabularies.
Researcher Affiliation Academia Eugene Bagdasaryan Cornell Tech eugene@cs.cornell.edu Omid Poursaeed Cornell Tech op63@cornell.edu Vitaly Shmatikov Cornell Tech shmat@cs.cornell.edu
Pseudocode Yes Algorithm 1: Differentially Private SGD (DP-SGD)
Open Source Code No The paper uses existing open-source frameworks like PyTorch and TensorFlow Privacy but does not state that the code for the specific methodology or experiments described in this paper is made publicly available.
Open Datasets Yes We use the recently released Flickr-based Diversity in Faces (Di F) dataset [27] and the UTKFace dataset [39] as another source of darker-skinned faces.
Dataset Splits No The paper mentions 'test set' for gender classification and implies training data, but does not explicitly provide details about a separate 'validation' dataset split for hyperparameter tuning across all experiments.
Hardware Specification Yes We ran them on two NVidia Titan X GPUs.
Software Dependencies No The paper mentions using PyTorch [32] and TF Privacy [36], but does not provide specific version numbers for these software libraries or other dependencies.
Experiment Setup Yes We use a Res Net18 model [18] with 11M parameters pre-trained on Image Net and train using the Adam optimizer, 0.0001 learning rate, and batch size b = 256. We run 60 epochs of DP training...