Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger
Authors: Zhiqi Bu, Yu-Xiang Wang, Sheng Zha, George Karypis
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate on various language and vision tasks that automatic clipping outperforms or matches the state-of-the-art, and can be easily employed with minimal changes to existing codebases1.We evaluate our automatic DP training on image classification, sentence classification, and table-to-text generation tasks. |
| Researcher Affiliation | Collaboration | Zhiqi Bu AWS AI zhiqibu@amazon.com Yu-Xiang Wang AWS AI, UC Santa Barbara yuxiangw@cs.ucsb.edu Sheng Zha AWS AI zhasheng@amazon.com George Karypis AWS AI gkarypis@amazon.com |
| Pseudocode | Yes | Algorithm 1 Automatic Deep Learning with DP Parameters: initial weights w0, learning rate ηt, sampling probability p, number of iterations T. 1: Compute σ such that ϵAccountant(δ, σ, p, T) ϵ from any privacy accountant. 2: for iteration t = 1, , T do 3: Sample a batch Bt by including each data point i.i.d. with probability p 4: Apply automatic clipping to per-sample gradients {gi}i Bt: ˆgi = gi/( gi 2 + 0.01). 5: Add Gaussian noise to the sum of clipped gradients: ˆg = P i ˆgi + σ N(0, I). 6: Update wt by any optimizer on the private gradient ˆg with learning rate ηt. |
| Open Source Code | Yes | Code for our experiments is available at Fast DP library https://github.com/awslabs/fast-differential-privacy. |
| Open Datasets | Yes | For MNIST/Fashion MNIST, we use the same setup as in [56, 68, 64] with a simple CNN. For CIFAR10, we use the same setup as in [68] with pretrained Sim CLRv2 [13]. For Image Nette, a 10-class sub-task of Image Net [18], we use the same setup as in [36] without the learning rate decay. For Celeb A [45], the real human face dataset, we train Res Net9 [32] with group normalization to replace the batch normalization. On five benchmark language datasets (MNLI(m/mm)[72], QQP[34], QNLI[62], SST2[67]), we compare our automatic DP training with re-parameterized gradient perturbation (RGP, [78]) and full-parameter finetuning (full, [41]) using Ro BERTa models [44]. We compare our automatic DP training with a variety of fine-tuning methods, for table-to-text generation task on E2E dataset [23] |
| Dataset Splits | Yes | MNLI(m) MNLI-matched, the matched validation and test splits from Multi-Genre Natural Language Inference Corpus. The datasets are processed and loaded from Huggingface [39], as described in https:// huggingface.co/datasets/glue. We follow the same setup as [78] and [41]. Table 5: Hyperparameters of automatic clipping and Abadi s clipping, for sentence classification in Table 2 and Table 3, using either Ro BERTa base or large. |
| Hardware Specification | No | No specific hardware (GPU/CPU models, memory) used for running the experiments is detailed in the paper. It mentions "large foundation models" like GPT2 and GPT3-175B, implying significant computation but without hardware specifics. |
| Software Dependencies | Yes | For Opacus [77] version 1.1.2 (latest), we can implement the all-layer automatic clipping by changing Line 399-401 in https://github.com/pytorch/opacus/blob/main/opacus/optimizers/ optimizer.py to... For Ob JAX version 1.6.0 (latest), we can implement the automatic clipping in https://github. com/google/objax/blob/master/objax/privacy/dpsgd/gradient.py by changing Line 92 to... |
| Experiment Setup | Yes | Detailed settings including hyperparameters can be found in Appendix G. Table 5: Hyperparameters of automatic clipping and Abadi s clipping, for sentence classification in Table 2 and Table 3, using either Ro BERTa base or large. Table 7: Hyperparameters of automatic clipping and Abadi s clipping, for the E2E generation task in Table 4. |