reproducibilityindex.ai

Large Language Models Can Be Strong Differentially Private Learners

Authors: Xuechen Li, Florian Tramer, Percy Liang, Tatsunori Hashimoto

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that this performance drop can be mitigated with (1) the use of large pretrained language models; (2) non-standard hyperparameters that suit DP optimization; and (3) ﬁnetuning objectives which are aligned with the pretraining procedure. With the above, we obtain NLP models that outperform state-of-the-art DP-trained models under the same privacy budget and strong non-private baselines by directly ﬁne-tuning pretrained models with DP optimization on moderately-sized corpora.
Researcher Affiliation	Collaboration	1Stanford University 2Google Research {lxuechen,tramer,pliang}@cs.stanford.edu, thashim@stanford.edu
Pseudocode	Yes	Algorithm 1 DP-Adam ... Algorithm 2 Adam Update
Open Source Code	Yes	Code to reproduce results can be found at https: //github.com/lxuechen/private-transformers.
Open Datasets	Yes	All experiments in the paper are based on publicly available datasets. Links to these datasets are included in the main text and appendices. ... We build models for sentence classiﬁcation and language generation tasks with datasets of modest sizes under (central/global) approximate-DP (also known as (ϵ, δ)-DP) (Dwork et al., 2014). ... We consider the datasets E2E (Novikova et al., 2017) and DART (Nan et al., 2020).
Dataset Splits	Yes	We report the F1 score and perplexity on the validation split, and human evaluated quality scores of generations. ... All numbers reported in Table 3 are obtained on the validation split.
Hardware Specification	Yes	For GPT-2-large, we were unable to ﬁt single-example micro batches together with gradient accumulation with Opacus or JAX on a TITAN RTX GPU (24 GBs of VRAM). ... These numbers are based on running with a single RTX 3090 with Py Torch==1.9.0.
Software Dependencies	Yes	The Opacus (Yousefpour et al., 2021) baseline is based on version 0.14.0 of the library... These numbers are based on running with a single RTX 3090 with Py Torch==1.9.0.
Experiment Setup	Yes	Table 4: Default hyperparameters for ablation studies. ... Table 5: Hyperparameter search range for different methods. ... Our experiments suggest that batch size is one of the most important hyperparameters to set correctly, and the dependence of the optimal batch size on learning rate and training epochs makes its selection complex.