Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach

Authors: Xinwei Zhang, Zhiqi Bu, Steven Wu, Mingyi Hong

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results on standard datasets show that the proposed algorithm achieves higher accuracies than DPSGD while maintaining the same level of DP guarantee.
Researcher Affiliation Collaboration Xinwei Zhang University of Minnesota zhan6234@umn.edu Zhiqi Bu Amazon AI. woodyx218@gmail.com Zhiwei Steven Wu Carnegie Mellon University zstevenwu@cmu.edu Mingyi Hong University of Minnesota mhong@umn.edu
Pseudocode Yes Algorithm 1 DPSGD Algorithm with Gradient Clipping, Algorithm 2 Dice SGD Algorithm, Algorithm 3 Adam variant of Dice SGD Algorithm, Algorithm 4 Automatic Dice SGD Algorithm (without C1, C2)
Open Source Code No The paper does not contain any statement about releasing code or a link to a code repository.
Open Datasets Yes We use both Cifar-10 and Cifar-100 datasets for experiments and use Vi Tsmall (Dosovitskiy et al., 2020) as the training model, which is pre-trained on Imagenet.
Dataset Splits No The paper mentions using CIFAR-10, CIFAR-100, and E2E NLG Challenge datasets. While these are standard benchmarks, the paper does not explicitly provide specific percentages, sample counts, or direct citations to predefined split methodologies in the text. It talks about 'fine-tuning' and 'batch size' but not the split ratios.
Hardware Specification Yes The experiments were run on an Intel Xeon W-2102 CPU with an NVIDIA TITAN X GPU for image classification, and on an NVIDIA A100 GPU for NLP tasks.
Software Dependencies No The paper mentions using 'Adam variant of DPSGD-GC developed following Bu et al. (2021)' and 'GPT-2 model (Radford et al., 2018)', but does not provide specific software names with version numbers for replication (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes We fine-tune the model for 3 epochs with batch size B = 1000. The stepsize for DPSGD-GC and Dice SGD are selected through grid search from η {10 2, 10 3, 10 4}. For GPT-2: fine-tune the GPT-2 model (Radford et al., 2018) on the E2E NLG Challenge for 10 epochs with batch size B = 1000 and initial stepsize η0 = 2 * 10 3 with learning rate warm-up and linear decay.