reproducibilityindex.ai

When Does Differentially Private Learning Not Suffer in High Dimensions?

Authors: Xuechen Li, Daogao Liu, Tatsunori B. Hashimoto, Huseyin A. Inan, Janardhan Kulkarni, Yin-Tat Lee, Abhradeep Guha Thakurta

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically show that in private fine-tuning of large language models, gradients obtained during fine-tuning are mostly controlled by a few principal components.
Researcher Affiliation	Collaboration	Xuechen Li Stanford University, Daogao Liu University of Washington, Tatsunori Hashimoto Stanford University, Huseyin A. Inan Microsoft Research, Janardhan Kulkarni Microsoft Research, Yin Tat Lee University of Washington Microsoft Research, Abhradeep Guha Thakurta Google Research
Pseudocode	Yes	Algorithm 1 presents the pseudocode.
Open Source Code	Yes	Code to reproduce our results can be found at https://github.com/lxuechen/private-transformers/tree/main/examples/classification/spectral_analysis.
Open Datasets	Yes	pretrained BERT [DCLT18] and GPT-2 [RNSS18, RWC+19] models can be fine-tuned... Res Nets [HZRS16] and vision-Transformers [DBK+20] can be fine-tuned to perform well for Image Net classification... Distil Ro BERTa [SDCW19, LOG+19] under ε = 8 and δ = 1/n1.1 for sentiment classification on the SST-2 dataset [SPW+13].
Dataset Splits	Yes	We include additional experimental setup details in Appendix C... which were tuned on separate validation data.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. It only refers to 'large pretrained models' and 'fine-tuning gigantic parameter vectors'.
Software Dependencies	No	The paper mentions 'pretrained BERT', 'GPT-2', 'Res Nets', 'vision-Transformers', 'Distil Ro BERTa' as models, but does not specify software dependencies with version numbers (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup	Yes	Specifically, we finetune Distil Ro BERTa [SDCW19, LOG+19] under ε = 8 and δ = 1/n1.1 for sentiment classification on the SST-2 dataset [SPW+13]. We reformulate the label prediction problem as templated text prediction [LTLH21], and fine-tune only the query and value matrices in attention layers... we over-train by privately fine-tuning for r = 2 × 10^3 updates.