Large Language Models Can Be Strong Differentially Private Learners

Authors: Xuechen Li, Florian Tramer, Percy Liang, Tatsunori Hashimoto

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that this performance drop can be mitigated with (1) the use of large pretrained language models; (2) non-standard hyperparameters that suit DP optimization; and (3) finetuning objectives which are aligned with the pretraining procedure. With the above, we obtain NLP models that outperform state-of-the-art DP-trained models under the same privacy budget and strong non-private baselines by directly fine-tuning pretrained models with DP optimization on moderately-sized corpora.
Researcher Affiliation Collaboration 1Stanford University 2Google Research {lxuechen,tramer,pliang}@cs.stanford.edu, thashim@stanford.edu
Pseudocode Yes Algorithm 1 DP-Adam ... Algorithm 2 Adam Update
Open Source Code Yes Code to reproduce results can be found at https: //github.com/lxuechen/private-transformers.
Open Datasets Yes All experiments in the paper are based on publicly available datasets. Links to these datasets are included in the main text and appendices. ... We build models for sentence classification and language generation tasks with datasets of modest sizes under (central/global) approximate-DP (also known as (ϵ, δ)-DP) (Dwork et al., 2014). ... We consider the datasets E2E (Novikova et al., 2017) and DART (Nan et al., 2020).
Dataset Splits Yes We report the F1 score and perplexity on the validation split, and human evaluated quality scores of generations. ... All numbers reported in Table 3 are obtained on the validation split.
Hardware Specification Yes For GPT-2-large, we were unable to fit single-example micro batches together with gradient accumulation with Opacus or JAX on a TITAN RTX GPU (24 GBs of VRAM). ... These numbers are based on running with a single RTX 3090 with Py Torch==1.9.0.
Software Dependencies Yes The Opacus (Yousefpour et al., 2021) baseline is based on version 0.14.0 of the library... These numbers are based on running with a single RTX 3090 with Py Torch==1.9.0.
Experiment Setup Yes Table 4: Default hyperparameters for ablation studies. ... Table 5: Hyperparameter search range for different methods. ... Our experiments suggest that batch size is one of the most important hyperparameters to set correctly, and the dependence of the optimal batch size on learning rate and training epochs makes its selection complex.