Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Exploring the Limits of Differentially Private Deep Learning with Group-wise Clipping
Authors: Jiyan He, Xuechen Li, Da Yu, Huishuai Zhang, Janardhan Kulkarni, Yin Tat Lee, Arturs Backurs, Nenghai Yu, Jiang Bian
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare the performance of adaptive per-layer clipping with that of flat clipping. For both algorithms, we use hyperparameters suggested by De et al. (2022) and tune learning rates. We use a fraction r = 0.01 of privacy budget for quantile estimation and choose the target quantile q from {0.5, 0.6, 0.7}. For both algorithms we train for 300 epochs. We summarize the details in Appendix A.1. Table 2 shows that adaptive per-layer clipping achieves training and validation accuracies on par with flat clipping for multiple choices of . |
| Researcher Affiliation | Collaboration | 1University of Science and Technology of China, 2Stanford University, 3Sun Yat-sen University 4Microsoft Research |
| Pseudocode | Yes | Algorithm 1 DP-SGD with adaptive per-layer clipping |
| Open Source Code | Yes | Code to reproduce some of our experiments can be found at https://github.com/lxuechen/perlayer-public. |
| Open Datasets | Yes | We train a wide Res Net (WRN16-4, 2.8M trainable parameters) (Zagoruyko & Komodakis, 2016) from scratch for CIFAR-10 classification with differential privacy. |
| Dataset Splits | Yes | To tune hyperparameters fairly, we split the training set of SST-2 into two parts: a new training set containing 80% of original training set and a validation set containing the remaining. We select the best hyperparameters with the performance on the validation set, averaging over 3 different seeds. |
| Hardware Specification | Yes | All experiments here are performed on a machine with a single Titan RTX GPU with 24 GB of VRAM (different from the configuration in Figure 1 which uses a single A6000 GPU). [...] For fine-tuning GPT-3 with DP Lo RA on SAMSum, we used a machine with 16 V100 GPUs each with 32 gigabytes of VRAM. |
| Software Dependencies | No | The paper mentions software like PyTorch, Hugging Face transformers, and Opacus, but does not provide specific version numbers for any of these dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | We set privacy parameter d = 10 5 and choose from {1, 3, 5, 8}, which are typical privacy parameters used in previous works. ... For both algorithms we train for 300 epochs. ... We set {3, 8} and d = 1/n1.1, where n is the size of training set. We tune the learning rate, batch size, and target quantile on SST-2 s training data... |