reproducibilityindex.ai

Differentially Private Fine-tuning of Language Models

Authors: Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, Sergey Yekhanin, Huishuai Zhang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally evaluate our methods for DP ﬁne-tuning to demonstrate their utility, privacy, and parameter-efﬁciency.
Researcher Affiliation	Collaboration	1Sun Yat-sen University , 2Microsoft Research Asia 3Microsoft Research, 4Microsoft 5Cheriton School of Computer Science, University of Waterloo 6University of Washington
Pseudocode	No	The paper describes algorithms and methods in text and mathematical formulations but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks with structured steps.
Open Source Code	Yes	Our code is publicly available at https://github.com/Anonymous AKES/ Differentially-Private-Fine-tuning-of-Language-Models.
Open Datasets	Yes	We use Ro BERTa models (Liu et al., 2019), which are pre-trained on public data collected from the web. We choose four downstream tasks: MNLI, QQP, QNLI, and SST-2 from GLUE (Wang et al., 2018), following Yu et al. (2021b).
Dataset Splits	Yes	The E2E dataset in Novikova et al. (2017) contains template-like information in the restaurant domain to be mapped to natural language with end-to-end training. The dataset consists of 42K training samples, 4.6K validation samples, and 4.6K test samples.
Hardware Specification	Yes	Table 2: Memory and speed comparison for Ro BERTa-Large. ... The speed is measured by the wall-clock time for training one epoch of the SST-2 dataset on a single Tesla V100 GPU with gradient accumulation for batch size 2000.
Software Dependencies	No	The paper mentions optimizers like Adam W and privacy accounting methods like Gopi et al. (2021)'s PRV accountant, but it does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup	Yes	Hyperparameter choice: Given the large number of hyperparameter choices, e.g., the intermediate representation dimension, learning rate, weight decay, privacy parameter δ, and model size, an exhaustive grid search over all hyperparameters is expensive. Our hyperparameter choices are informed by prior work and are as follows. For privacy parameters, we use δ = 1e-5 for SST-2 and QNLI and δ = 1e-6 for QQP and MNLI due to their dataset sizes, and use noise multipliers 0.92, 0.83, 0.66 and 0.65 for SST-2, QNLI, QQP, and MNLI, respectively... The clipping threshold is 10 for all methods. The batch size is 2000. ... We train for 20 epochs using Adam W (Loshchilov & Hutter, 2019) with weight decay 1e-2 and search over four learning rates {5e-4, 1e-3, 2e-3, 5e-3}.