DPZero: Private Fine-Tuning of Language Models without Backpropagation
Authors: Liang Zhang, Bingcong Li, Kiran Koshy Thekumparampil, Sewoong Oh, Niao He
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The memory efficiency of DPZERO is demonstrated in privately fine-tuning Ro BERTa and OPT on several downstream tasks. Our code is available at https: //github.com/Liang137/DPZero. |
| Researcher Affiliation | Collaboration | 1 Department of Computer Science, ETH Zurich 2 Amazon Search 3 Paul G. Allen School of Computer Science and Engineering, University of Washington. |
| Pseudocode | Yes | Algorithm 1 DPGD-0th and Algorithm 2 DPZERO are provided. |
| Open Source Code | Yes | Our code is available at https: //github.com/Liang137/DPZero. |
| Open Datasets | Yes | We provide empirical results on synthetic problems and private fine-tuning of language models for sentence classification and generation tasks. A thorough description of the experimental settings is available in Appendix B. All experiments are tested on a single NVIDIA Ge Force RTX 3090 GPU with 24 Gi B memory. Code is available at https://github.com/Liang137/DPZero. |
| Dataset Splits | Yes | We consider the few-shot scenario with 512 samples per class... The test set is also composed of 1000 randomly selected samples from the original test dataset. |
| Hardware Specification | Yes | All experiments are tested on a single NVIDIA Ge Force RTX 3090 GPU with 24 Gi B memory. |
| Software Dependencies | No | The paper mentions: 'Our implementation of DPZERO utilizes the codebase provided by Malladi et al. (2023).' but does not specify software versions for programming languages, libraries, or frameworks like PyTorch, Python, or CUDA. |
| Experiment Setup | Yes | We fix the total number of iterations to be 10000, the batch size to be 64, and the smoothing parameter λ = 10 3 for both DPZERO and the non-private zeroth-order baseline Me ZO (Malladi et al., 2023). ... The learning rate to be 10 6... All results are averaged through three different random seeds {42, 13, 21}... |