Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach
Authors: Xinwei Zhang, Zhiqi Bu, Steven Wu, Mingyi Hong
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results on standard datasets show that the proposed algorithm achieves higher accuracies than DPSGD while maintaining the same level of DP guarantee. |
| Researcher Affiliation | Collaboration | Xinwei Zhang University of Minnesota zhan6234@umn.edu Zhiqi Bu Amazon AI. woodyx218@gmail.com Zhiwei Steven Wu Carnegie Mellon University zstevenwu@cmu.edu Mingyi Hong University of Minnesota mhong@umn.edu |
| Pseudocode | Yes | Algorithm 1 DPSGD Algorithm with Gradient Clipping, Algorithm 2 Dice SGD Algorithm, Algorithm 3 Adam variant of Dice SGD Algorithm, Algorithm 4 Automatic Dice SGD Algorithm (without C1, C2) |
| Open Source Code | No | The paper does not contain any statement about releasing code or a link to a code repository. |
| Open Datasets | Yes | We use both Cifar-10 and Cifar-100 datasets for experiments and use Vi Tsmall (Dosovitskiy et al., 2020) as the training model, which is pre-trained on Imagenet. |
| Dataset Splits | No | The paper mentions using CIFAR-10, CIFAR-100, and E2E NLG Challenge datasets. While these are standard benchmarks, the paper does not explicitly provide specific percentages, sample counts, or direct citations to predefined split methodologies in the text. It talks about 'fine-tuning' and 'batch size' but not the split ratios. |
| Hardware Specification | Yes | The experiments were run on an Intel Xeon W-2102 CPU with an NVIDIA TITAN X GPU for image classification, and on an NVIDIA A100 GPU for NLP tasks. |
| Software Dependencies | No | The paper mentions using 'Adam variant of DPSGD-GC developed following Bu et al. (2021)' and 'GPT-2 model (Radford et al., 2018)', but does not provide specific software names with version numbers for replication (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We fine-tune the model for 3 epochs with batch size B = 1000. The stepsize for DPSGD-GC and Dice SGD are selected through grid search from η {10 2, 10 3, 10 4}. For GPT-2: fine-tune the GPT-2 model (Radford et al., 2018) on the E2E NLG Challenge for 10 epochs with batch size B = 1000 and initial stepsize η0 = 2 * 10 3 with learning rate warm-up and linear decay. |