When Does Differentially Private Learning Not Suffer in High Dimensions?

Authors: Xuechen Li, Daogao Liu, Tatsunori B. Hashimoto, Huseyin A. Inan, Janardhan Kulkarni, Yin-Tat Lee, Abhradeep Guha Thakurta

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show that in private fine-tuning of large language models, gradients obtained during fine-tuning are mostly controlled by a few principal components.
Researcher Affiliation Collaboration Xuechen Li Stanford University, Daogao Liu University of Washington, Tatsunori Hashimoto Stanford University, Huseyin A. Inan Microsoft Research, Janardhan Kulkarni Microsoft Research, Yin Tat Lee University of Washington Microsoft Research, Abhradeep Guha Thakurta Google Research
Pseudocode Yes Algorithm 1 presents the pseudocode.
Open Source Code Yes Code to reproduce our results can be found at https://github.com/lxuechen/private-transformers/tree/main/examples/classification/spectral_analysis.
Open Datasets Yes pretrained BERT [DCLT18] and GPT-2 [RNSS18, RWC+19] models can be fine-tuned... Res Nets [HZRS16] and vision-Transformers [DBK+20] can be fine-tuned to perform well for Image Net classification... Distil Ro BERTa [SDCW19, LOG+19] under ε = 8 and δ = 1/n1.1 for sentiment classification on the SST-2 dataset [SPW+13].
Dataset Splits Yes We include additional experimental setup details in Appendix C... which were tuned on separate validation data.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only refers to 'large pretrained models' and 'fine-tuning gigantic parameter vectors'.
Software Dependencies No The paper mentions 'pretrained BERT', 'GPT-2', 'Res Nets', 'vision-Transformers', 'Distil Ro BERTa' as models, but does not specify software dependencies with version numbers (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup Yes Specifically, we finetune Distil Ro BERTa [SDCW19, LOG+19] under ε = 8 and δ = 1/n1.1 for sentiment classification on the SST-2 dataset [SPW+13]. We reformulate the label prediction problem as templated text prediction [LTLH21], and fine-tune only the query and value matrices in attention layers... we over-train by privately fine-tuning for r = 2 × 10^3 updates.