reproducibilityindex.ai

Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger

Authors: Zhiqi Bu, Yu-Xiang Wang, Sheng Zha, George Karypis

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate on various language and vision tasks that automatic clipping outperforms or matches the state-of-the-art, and can be easily employed with minimal changes to existing codebases1.We evaluate our automatic DP training on image classification, sentence classification, and table-to-text generation tasks.
Researcher Affiliation	Collaboration	Zhiqi Bu AWS AI zhiqibu@amazon.com Yu-Xiang Wang AWS AI, UC Santa Barbara yuxiangw@cs.ucsb.edu Sheng Zha AWS AI zhasheng@amazon.com George Karypis AWS AI gkarypis@amazon.com
Pseudocode	Yes	Algorithm 1 Automatic Deep Learning with DP Parameters: initial weights w0, learning rate ηt, sampling probability p, number of iterations T. 1: Compute σ such that ϵAccountant(δ, σ, p, T) ϵ from any privacy accountant. 2: for iteration t = 1, , T do 3: Sample a batch Bt by including each data point i.i.d. with probability p 4: Apply automatic clipping to per-sample gradients {gi}i Bt: ˆgi = gi/( gi 2 + 0.01). 5: Add Gaussian noise to the sum of clipped gradients: ˆg = P i ˆgi + σ N(0, I). 6: Update wt by any optimizer on the private gradient ˆg with learning rate ηt.
Open Source Code	Yes	Code for our experiments is available at Fast DP library https://github.com/awslabs/fast-differential-privacy.
Open Datasets	Yes	For MNIST/Fashion MNIST, we use the same setup as in [56, 68, 64] with a simple CNN. For CIFAR10, we use the same setup as in [68] with pretrained Sim CLRv2 [13]. For Image Nette, a 10-class sub-task of Image Net [18], we use the same setup as in [36] without the learning rate decay. For Celeb A [45], the real human face dataset, we train Res Net9 [32] with group normalization to replace the batch normalization. On five benchmark language datasets (MNLI(m/mm)[72], QQP[34], QNLI[62], SST2[67]), we compare our automatic DP training with re-parameterized gradient perturbation (RGP, [78]) and full-parameter finetuning (full, [41]) using Ro BERTa models [44]. We compare our automatic DP training with a variety of fine-tuning methods, for table-to-text generation task on E2E dataset [23]
Dataset Splits	Yes	MNLI(m) MNLI-matched, the matched validation and test splits from Multi-Genre Natural Language Inference Corpus. The datasets are processed and loaded from Huggingface [39], as described in https:// huggingface.co/datasets/glue. We follow the same setup as [78] and [41]. Table 5: Hyperparameters of automatic clipping and Abadi s clipping, for sentence classification in Table 2 and Table 3, using either Ro BERTa base or large.
Hardware Specification	No	No specific hardware (GPU/CPU models, memory) used for running the experiments is detailed in the paper. It mentions "large foundation models" like GPT2 and GPT3-175B, implying significant computation but without hardware specifics.
Software Dependencies	Yes	For Opacus [77] version 1.1.2 (latest), we can implement the all-layer automatic clipping by changing Line 399-401 in https://github.com/pytorch/opacus/blob/main/opacus/optimizers/ optimizer.py to... For Ob JAX version 1.6.0 (latest), we can implement the automatic clipping in https://github. com/google/objax/blob/master/objax/privacy/dpsgd/gradient.py by changing Line 92 to...
Experiment Setup	Yes	Detailed settings including hyperparameters can be found in Appendix G. Table 5: Hyperparameters of automatic clipping and Abadi s clipping, for sentence classification in Table 2 and Table 3, using either Ro BERTa base or large. Table 7: Hyperparameters of automatic clipping and Abadi s clipping, for the E2E generation task in Table 4.