Exploring the Limits of Differentially Private Deep Learning with Group-wise Clipping
Authors: Jiyan He, Xuechen Li, Da Yu, Huishuai Zhang, Janardhan Kulkarni, Yin Tat Lee, Arturs Backurs, Nenghai Yu, Jiang Bian
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare the performance of adaptive per-layer clipping with that of flat clipping. For both algorithms, we use hyperparameters suggested by De et al. (2022) and tune learning rates. We use a fraction r = 0.01 of privacy budget for quantile estimation and choose the target quantile q from {0.5, 0.6, 0.7}. For both algorithms we train for 300 epochs. We summarize the details in Appendix A.1. Table 2 shows that adaptive per-layer clipping achieves training and validation accuracies on par with flat clipping for multiple choices of . |
| Researcher Affiliation | Collaboration | 1University of Science and Technology of China, 2Stanford University, 3Sun Yat-sen University 4Microsoft Research |
| Pseudocode | Yes | Algorithm 1 DP-SGD with adaptive per-layer clipping |
| Open Source Code | Yes | Code to reproduce some of our experiments can be found at https://github.com/lxuechen/perlayer-public. |
| Open Datasets | Yes | We train a wide Res Net (WRN16-4, 2.8M trainable parameters) (Zagoruyko & Komodakis, 2016) from scratch for CIFAR-10 classification with differential privacy. |
| Dataset Splits | Yes | To tune hyperparameters fairly, we split the training set of SST-2 into two parts: a new training set containing 80% of original training set and a validation set containing the remaining. We select the best hyperparameters with the performance on the validation set, averaging over 3 different seeds. |
| Hardware Specification | Yes | All experiments here are performed on a machine with a single Titan RTX GPU with 24 GB of VRAM (different from the configuration in Figure 1 which uses a single A6000 GPU). [...] For fine-tuning GPT-3 with DP Lo RA on SAMSum, we used a machine with 16 V100 GPUs each with 32 gigabytes of VRAM. |
| Software Dependencies | No | The paper mentions software like PyTorch, Hugging Face transformers, and Opacus, but does not provide specific version numbers for any of these dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | We set privacy parameter d = 10 5 and choose from {1, 3, 5, 8}, which are typical privacy parameters used in previous works. ... For both algorithms we train for 300 epochs. ... We set {3, 8} and d = 1/n1.1, where n is the size of training set. We tune the learning rate, batch size, and target quantile on SST-2 s training data... |