reproducibilityindex.ai

Differentially Private Bias-Term Fine-tuning of Foundation Models

Authors: Zhiqi Bu, Yu-Xiang Wang, Sheng Zha, George Karypis

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We study the problem of differentially private (DP) fine-tuning of large pre-trained models a recent privacy-preserving approach suitable for solving downstream tasks with sensitive data. Existing work has demonstrated that high accuracy is possible under strong privacy constraint, yet requires significant computational overhead or modifications to the network architecture. We propose differentially private bias-term fine-tuning (DP-Bi TFi T), which matches the state-of-the-art accuracy for DP algorithms and the efficiency of the standard Bi TFi T. DP-Bi TFi T is model agnostic (not modifying the network architecture), parameter efficient (only training about 0.1% of the parameters), and computation efficient (almost removing the overhead caused by DP, in both the time and space complexity). On a wide range of tasks, DP-Bi TFi T is 2 30 faster and uses 2 8 less memory than DP full finetuning, even faster than the standard full finetuning. This amazing efficiency enables us to conduct DP fine-tuning on language and vision tasks with long-sequence texts and high-resolution images, which were computationally difficult using existing methods. We open-source our code at Fast DP (https://github.com/awslabs/ fast-differential-privacy). Tables 3, 4, 5, 6 and Figures 1, 3, 4 show quantitative results.
Researcher Affiliation	Collaboration	1Amazon AI 2University of California, San Diego.
Pseudocode	Yes	Algorithm 1 DP Bias-Term Fine-Tuning (Bi TFi T)
Open Source Code	Yes	We open-source our code at Fast DP (https://github.com/awslabs/ fast-differential-privacy).
Open Datasets	Yes	For text classification, we experiment on four datasets: MNLI(m)... QQP... QNLI... SST2.... For E2E generation task, we experiment GPT2 models.... For CIFAR10 and CIFAR100.... For Celeb A.... Throughout this work, the text datasets are processed and loaded from Huggingface (Lhoest et al., 2021). We use the same setup as Li et al. (2021); Bu et al. (2022b). Our experiments are heavily based on Private CNN (i.e. Mix Ghost Clip algorithm (Bu et al., 2022a)) and TIMM codebases.
Dataset Splits	No	The paper provides hyperparameters and discusses training settings (e.g., batch size, epochs, learning rate) and references external works for setup details, but it does not explicitly state the train/validation/test split percentages or sample counts for any of the datasets used.
Hardware Specification	Yes	To give a concrete example, we apply DP-Bi TFi T to the Ro BERTa-large model on QQP dataset, following the same setting as Li et al. (2021) and using one 40GB A100 GPU.
Software Dependencies	No	The paper mentions several software tools and libraries like Pytorch, Opacus, Ghost Clip, Private Transformers, Fast Grad Clip, Private-Vision, Huggingface, TIMM, Tensorflow, and JAX. However, it does not provide specific version numbers for these software dependencies, which would be necessary for full reproducibility.
Experiment Setup	Yes	Appendix D provides 'Experiment details'. Specifically, Table 8 details 'Hyperparameters of text classification' including 'epoch', 'batch size', 'clipping threshold R', 'DP learning rate', 'non-DP learning rate', and 'max sequence length'. Table 9 lists 'Hyperparameters of E2E generation task' with similar details. Table 10 provides 'Hyperparameters of image classification task' including 'epoch', 'batch size', 'clipping threshold', 'DP learning rate', 'non-DP learning rate', 'learning rate decay', and 'normalizing data'.