Differentially Private Optimization on Large Model at Small Cost
Authors: Zhiqi Bu, Yu-Xiang Wang, Sheng Zha, George Karypis
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The computational advantage of BK is supported by the complexity analysis as well as extensive experiments on vision and language tasks. Our implementation achieves state-of-the-art (SOTA) accuracy with very small extra cost: on GPT2 and at almost the same memory cost (< 1% overhead), BK has 1.03 the time complexity of the standard training (0.83 training speed in practice), and 0.61 the time complexity of the most efficient DP implementation (1.36 training speed in practice). We open-source the codebase for the BK algorithm at https://github.com/awslabs/fast-differential-privacy. |
| Researcher Affiliation | Collaboration | 1Amazon Web Services 2University of California, Santa Barbara. Correspondence to: Zhiqi Bu <zhiqibu@amazon.com>. |
| Pseudocode | Yes | Algorithm 1 Differentially private deep learning with BK Algorithm 2 DP optimizer with BK or Ghost Clip Algorithm 3 DP optimizer with BK or Opacus Algorithm 4 DP optimizer with BK or Standard optimizer Algorithm 5 DP optimizer with BK, BKMix Ghost Clip or BKMix Opt |
| Open Source Code | Yes | We open-source the codebase for the BK algorithm at https://github.com/awslabs/fast-differential-privacy. |
| Open Datasets | Yes | recent advances have shed light on the success of DP GPT2 (Li et al., 2021; Bu et al., 2022b; Yu et al., 2021), which achieves 64.6 BLEU score1 at strong privacy guarantee (ϵ = 3), on the text generation task using E2E restaurant review dataset. ... on computer vision tasks (ϵ = 2), DP vision transformers and Res Nets have obtained 97.1%/86.2% accuracy on CIFAR10/100 by (Bu et al., 2022a) and over 81% accuracy on Image Net by (De et al., 2022; Mehta et al., 2022). ... short-sequence datasets including GLUE (Wang et al., 2019) (e.g. SST2/QNLI/MNLI/QQP) and natural language generation datasets (e.g. E2E/DART)... For paragraph or document-level language tasks like Wiki Hop (Welbl et al., 2018) and Trivia QA (Joshi et al., 2017) |
| Dataset Splits | No | The paper does not explicitly provide specific training/validation/test dataset splits (e.g., percentages, counts, or references to predefined splits) in the main text. It mentions using standard datasets like CIFAR10/100 and ImageNet, which typically have predefined splits, but the paper itself does not detail these splits. |
| Hardware Specification | Yes | Table 1. Efficiency of BK algorithm on DP tasks using one A100 GPU (same accuracy). |
| Software Dependencies | No | The paper mentions several software components like 'Pytorch (Paszke et al., 2019)', 'Tensorflow-Privacy (TFPrivacy) library in (Bu et al., 2021a)', 'Tensorflow 2 and the XLA compiler (Subramani et al., 2021)', 'JAX', and 'Opacus (Yousefpour et al., 2021)'. However, it does not provide specific version numbers for all key software dependencies needed to reproduce the experiments comprehensively. For example, while Pytorch is mentioned with a citation, an explicit version number (e.g., 'Pytorch 1.9') is not stated. |
| Experiment Setup | Yes | Figure 2. Speed and memory on MLP and CIFAR100 (images are flattened into vectors). Left to right: deep network (50 layers, width 1000, 50M parameters, batch size 128), shallow network (10 layers, width 1000, 10M parameters, batch size 128), and wide network (10 layers, width 5000, 250M parameters, batch size 128 or 1024; Opacus is OOM). ... Parameter: l-th layer weights W(l), number of layers L, noise level σ. ... privacy_engine = Privacy Engine( model,epochs, batch_size,sample_size, target_epsilon,target_delta) |