Large Language Models Can Be Strong Differentially Private Learners
Authors: Xuechen Li, Florian Tramer, Percy Liang, Tatsunori Hashimoto
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that this performance drop can be mitigated with (1) the use of large pretrained language models; (2) non-standard hyperparameters that suit DP optimization; and (3) finetuning objectives which are aligned with the pretraining procedure. With the above, we obtain NLP models that outperform state-of-the-art DP-trained models under the same privacy budget and strong non-private baselines by directly fine-tuning pretrained models with DP optimization on moderately-sized corpora. |
| Researcher Affiliation | Collaboration | 1Stanford University 2Google Research {lxuechen,tramer,pliang}@cs.stanford.edu, thashim@stanford.edu |
| Pseudocode | Yes | Algorithm 1 DP-Adam ... Algorithm 2 Adam Update |
| Open Source Code | Yes | Code to reproduce results can be found at https: //github.com/lxuechen/private-transformers. |
| Open Datasets | Yes | All experiments in the paper are based on publicly available datasets. Links to these datasets are included in the main text and appendices. ... We build models for sentence classification and language generation tasks with datasets of modest sizes under (central/global) approximate-DP (also known as (ϵ, δ)-DP) (Dwork et al., 2014). ... We consider the datasets E2E (Novikova et al., 2017) and DART (Nan et al., 2020). |
| Dataset Splits | Yes | We report the F1 score and perplexity on the validation split, and human evaluated quality scores of generations. ... All numbers reported in Table 3 are obtained on the validation split. |
| Hardware Specification | Yes | For GPT-2-large, we were unable to fit single-example micro batches together with gradient accumulation with Opacus or JAX on a TITAN RTX GPU (24 GBs of VRAM). ... These numbers are based on running with a single RTX 3090 with Py Torch==1.9.0. |
| Software Dependencies | Yes | The Opacus (Yousefpour et al., 2021) baseline is based on version 0.14.0 of the library... These numbers are based on running with a single RTX 3090 with Py Torch==1.9.0. |
| Experiment Setup | Yes | Table 4: Default hyperparameters for ablation studies. ... Table 5: Hyperparameter search range for different methods. ... Our experiments suggest that batch size is one of the most important hyperparameters to set correctly, and the dependence of the optimal batch size on learning rate and training epochs makes its selection complex. |