Differentially Private Optimization on Large Model at Small Cost

Authors: Zhiqi Bu, Yu-Xiang Wang, Sheng Zha, George Karypis

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The computational advantage of BK is supported by the complexity analysis as well as extensive experiments on vision and language tasks. Our implementation achieves state-of-the-art (SOTA) accuracy with very small extra cost: on GPT2 and at almost the same memory cost (< 1% overhead), BK has 1.03 the time complexity of the standard training (0.83 training speed in practice), and 0.61 the time complexity of the most efficient DP implementation (1.36 training speed in practice). We open-source the codebase for the BK algorithm at https://github.com/awslabs/fast-differential-privacy.
Researcher Affiliation Collaboration 1Amazon Web Services 2University of California, Santa Barbara. Correspondence to: Zhiqi Bu <zhiqibu@amazon.com>.
Pseudocode Yes Algorithm 1 Differentially private deep learning with BK Algorithm 2 DP optimizer with BK or Ghost Clip Algorithm 3 DP optimizer with BK or Opacus Algorithm 4 DP optimizer with BK or Standard optimizer Algorithm 5 DP optimizer with BK, BKMix Ghost Clip or BKMix Opt
Open Source Code Yes We open-source the codebase for the BK algorithm at https://github.com/awslabs/fast-differential-privacy.
Open Datasets Yes recent advances have shed light on the success of DP GPT2 (Li et al., 2021; Bu et al., 2022b; Yu et al., 2021), which achieves 64.6 BLEU score1 at strong privacy guarantee (ϵ = 3), on the text generation task using E2E restaurant review dataset. ... on computer vision tasks (ϵ = 2), DP vision transformers and Res Nets have obtained 97.1%/86.2% accuracy on CIFAR10/100 by (Bu et al., 2022a) and over 81% accuracy on Image Net by (De et al., 2022; Mehta et al., 2022). ... short-sequence datasets including GLUE (Wang et al., 2019) (e.g. SST2/QNLI/MNLI/QQP) and natural language generation datasets (e.g. E2E/DART)... For paragraph or document-level language tasks like Wiki Hop (Welbl et al., 2018) and Trivia QA (Joshi et al., 2017)
Dataset Splits No The paper does not explicitly provide specific training/validation/test dataset splits (e.g., percentages, counts, or references to predefined splits) in the main text. It mentions using standard datasets like CIFAR10/100 and ImageNet, which typically have predefined splits, but the paper itself does not detail these splits.
Hardware Specification Yes Table 1. Efficiency of BK algorithm on DP tasks using one A100 GPU (same accuracy).
Software Dependencies No The paper mentions several software components like 'Pytorch (Paszke et al., 2019)', 'Tensorflow-Privacy (TFPrivacy) library in (Bu et al., 2021a)', 'Tensorflow 2 and the XLA compiler (Subramani et al., 2021)', 'JAX', and 'Opacus (Yousefpour et al., 2021)'. However, it does not provide specific version numbers for all key software dependencies needed to reproduce the experiments comprehensively. For example, while Pytorch is mentioned with a citation, an explicit version number (e.g., 'Pytorch 1.9') is not stated.
Experiment Setup Yes Figure 2. Speed and memory on MLP and CIFAR100 (images are flattened into vectors). Left to right: deep network (50 layers, width 1000, 50M parameters, batch size 128), shallow network (10 layers, width 1000, 10M parameters, batch size 128), and wide network (10 layers, width 5000, 250M parameters, batch size 128 or 1024; Opacus is OOM). ... Parameter: l-th layer weights W(l), number of layers L, noise level σ. ... privacy_engine = Privacy Engine( model,epochs, batch_size,sample_size, target_epsilon,target_delta)