Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Improving Differentially Private SGD via Randomly Sparsified Gradients
Authors: Junyi Zhu, Matthew B. Blaschko
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments with various DP-SGD frameworks show that RS can improve performance. Additionally, our theoretical analysis and empirical evaluations show that the trade-off is not trivial but possibly a unique property of DP-SGD, as either canceling noisification or gradient clipping eliminates the trade-off in the bound. In Section 6 we empirically verify our analysis and show the utility of our proposed algorithm. |
| Researcher Affiliation | Academia | Junyi Zhu EMAIL Center for Processing Speech and Images, Department of Electrical Engineering (ESAT) KU Leuven, Belgium Matthew B. Blaschko EMAIL Center for Processing Speech and Images, Department of Electrical Engineering (ESAT) KU Leuven, Belgium |
| Pseudocode | Yes | Algorithm 1 outlines our approach. RS is also compatible with SGD with gradient momentum and Adam (Kingma & Ba, 2014). In this section, we present a practically efficient and lightweight RS approach. The additional cost of running RS is negligible. |
| Open Source Code | Yes | Our code is available at https://github.com/Junyi Zhu-AI/Random Sparsification. |
| Open Datasets | Yes | measured on DP-CNN with dataset CIFAR10. |
| Dataset Splits | No | In the implementation of many previous works (Papernot et al., 2021; Tramer & Boneh, 2021; Yu et al., 2021a), sampling is conducted by randomly shuffling and partitioning the dataset into batches of fixed size, we follow this convention. |
| Hardware Specification | No | We focus on DP image classification and run all experiments on a cluster within the same container environment 5 times using the same group of 5 random seeds. |
| Software Dependencies | No | Our code is implemented in Py Torch (Paszke et al., 2019b). To compute the gradients of an individual example in a mini-batch we use the Back PACK package (Dangel et al., 2020). Cumulative privacy loss has been tracked with the Opacus package, which adopts Rényi differential privacy (Mironov, 2017; Balle et al., 2020). |
| Experiment Setup | Yes | We set batch size to 1000 and train for 100 epochs, the network is DP-CNN. The result shows that the trade-off favors RS as σ C increases, except the last Figure 2i, where due to excessive noise the network degrades after a few epochs. We adopt the best hyperparameters provided in previous works, then we do grid search for baselines and RS over the clipping bound C {0.1, 0.5, 1} where C = 0.1 is given in previous works, and epochs E {E , 1.2 E , 1.5 E } (σ is adapted accordingly), where E is the best given in previous works for different frameworks. When conducting RS we use gradual cooling and search for the final sparsification rate r {0.5, 0.7, 0.9}. |