Exploring the Limits of Differentially Private Deep Learning with Group-wise Clipping

Authors: Jiyan He, Xuechen Li, Da Yu, Huishuai Zhang, Janardhan Kulkarni, Yin Tat Lee, Arturs Backurs, Nenghai Yu, Jiang Bian

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare the performance of adaptive per-layer clipping with that of flat clipping. For both algorithms, we use hyperparameters suggested by De et al. (2022) and tune learning rates. We use a fraction r = 0.01 of privacy budget for quantile estimation and choose the target quantile q from {0.5, 0.6, 0.7}. For both algorithms we train for 300 epochs. We summarize the details in Appendix A.1. Table 2 shows that adaptive per-layer clipping achieves training and validation accuracies on par with flat clipping for multiple choices of .
Researcher Affiliation Collaboration 1University of Science and Technology of China, 2Stanford University, 3Sun Yat-sen University 4Microsoft Research
Pseudocode Yes Algorithm 1 DP-SGD with adaptive per-layer clipping
Open Source Code Yes Code to reproduce some of our experiments can be found at https://github.com/lxuechen/perlayer-public.
Open Datasets Yes We train a wide Res Net (WRN16-4, 2.8M trainable parameters) (Zagoruyko & Komodakis, 2016) from scratch for CIFAR-10 classification with differential privacy.
Dataset Splits Yes To tune hyperparameters fairly, we split the training set of SST-2 into two parts: a new training set containing 80% of original training set and a validation set containing the remaining. We select the best hyperparameters with the performance on the validation set, averaging over 3 different seeds.
Hardware Specification Yes All experiments here are performed on a machine with a single Titan RTX GPU with 24 GB of VRAM (different from the configuration in Figure 1 which uses a single A6000 GPU). [...] For fine-tuning GPT-3 with DP Lo RA on SAMSum, we used a machine with 16 V100 GPUs each with 32 gigabytes of VRAM.
Software Dependencies No The paper mentions software like PyTorch, Hugging Face transformers, and Opacus, but does not provide specific version numbers for any of these dependencies, which is required for reproducibility.
Experiment Setup Yes We set privacy parameter d = 10 5 and choose from {1, 3, 5, 8}, which are typical privacy parameters used in previous works. ... For both algorithms we train for 300 epochs. ... We set {3, 8} and d = 1/n1.1, where n is the size of training set. We tune the learning rate, batch size, and target quantile on SST-2 s training data...