reproducibilityindex.ai

Effectively Using Public Data in Privacy Preserving Machine Learning

Authors: Milad Nasr, Saeed Mahloujifar, Xinyu Tang, Prateek Mittal, Amir Houmansadr

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results demonstrate the effectiveness of our approach in improving the state-of-the-art in DP machine learning across multiple datasets, network architectures, and application domains.
Researcher Affiliation	Collaboration	1Google Deepmind 2Princeton University 3University of Massachusetts Amherst.
Pseudocode	Yes	Algorithm 1 DP-SGD with Adaptive Origin (DP-SGDA)
Open Source Code	No	We implemented Algorithm 2 and the related works in JAX ((Bradbury et al., 2018)) and we implemented Algorithm 2 in Opacus ((Yousefpour et al., 2021) and privatetransformers library (Li et al., 2022b)).
Open Datasets	Yes	CIFAR10 dataset
Dataset Splits	No	In our experiments, we first evaluated the effect of each individual setting and then in cases where we did not specify the setting, the results represent the extended settings3. Please note that we did hyper-parameter tuning for each setting (as detailed in Appendix A).
Hardware Specification	Yes	Training WRN40-4 on eight A100 in our setting takes more than 96 hours.
Software Dependencies	No	We implemented Algorithm 2 and the related works in JAX ((Bradbury et al., 2018)) and we implemented Algorithm 2 in Opacus ((Yousefpour et al., 2021) and privatetransformers library (Li et al., 2022b)).
Experiment Setup	Yes	Table 8: Set of hyper-parameters used in the hyper-tuning phase. Parameter Values Learning rate [1,2,3,4,5,5.5,6] Noise multiplier [1,2,3,4,5,8] Public data sample size [80,160,640,1280,2560] Clipping norm [0.5,0.8,1.0,1.5] Batch size [512,1024,2048,4096]