Effectively Using Public Data in Privacy Preserving Machine Learning

Authors: Milad Nasr, Saeed Mahloujifar, Xinyu Tang, Prateek Mittal, Amir Houmansadr

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate the effectiveness of our approach in improving the state-of-the-art in DP machine learning across multiple datasets, network architectures, and application domains.
Researcher Affiliation Collaboration 1Google Deepmind 2Princeton University 3University of Massachusetts Amherst.
Pseudocode Yes Algorithm 1 DP-SGD with Adaptive Origin (DP-SGDA)
Open Source Code No We implemented Algorithm 2 and the related works in JAX ((Bradbury et al., 2018)) and we implemented Algorithm 2 in Opacus ((Yousefpour et al., 2021) and privatetransformers library (Li et al., 2022b)).
Open Datasets Yes CIFAR10 dataset
Dataset Splits No In our experiments, we first evaluated the effect of each individual setting and then in cases where we did not specify the setting, the results represent the extended settings3. Please note that we did hyper-parameter tuning for each setting (as detailed in Appendix A).
Hardware Specification Yes Training WRN40-4 on eight A100 in our setting takes more than 96 hours.
Software Dependencies No We implemented Algorithm 2 and the related works in JAX ((Bradbury et al., 2018)) and we implemented Algorithm 2 in Opacus ((Yousefpour et al., 2021) and privatetransformers library (Li et al., 2022b)).
Experiment Setup Yes Table 8: Set of hyper-parameters used in the hyper-tuning phase. Parameter Values Learning rate [1,2,3,4,5,5.5,6] Noise multiplier [1,2,3,4,5,8] Public data sample size [80,160,640,1280,2560] Clipping norm [0.5,0.8,1.0,1.5] Batch size [512,1024,2048,4096]