reproducibilityindex.ai

Optimal Differentially Private Model Training with Public Data

Authors: Andrew Lowy, Zeman Li, Tianjian Huang, Meisam Razaviyayn

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, our algorithms show benefits over the state-of-the-art. Our experiments show that our algorithms outperform the na ıve approaches, even when the optimal DP algorithm is pre-trained on the public data. For example, our Algorithm 1 achieves a significant improvement in CIFAR-10 image classification tasks, reducing test error by 8 9% for logistic regression and by at most 18.9% for Wide-Res Net, compared with the na ıve approaches. and 4. Numerical Experiments section.
Researcher Affiliation	Academia	1University of Wisconsin-Madison, Wisconsin Institute of Discovery, Madison, WI, USA 2Department of Industrial & Systems Engineering, University of Southern California, Los Angeles, CA, USA.
Pseudocode	Yes	Algorithm 1 Semi-DP-SGD via Weighted-Gaussian Gradient Estimation on page 5.
Open Source Code	Yes	Code for all of the experiments is available here: https://github.com/optimization-for-data-driven-science/DP-with-public-data.
Open Datasets	Yes	We evaluate the performance of Algorithm 1 in training a logistic regression model to classify digits in the CIFAR-10 dataset (Krizhevsky et al., 2009).
Dataset Splits	Yes	Our synthetic dataset consists of n 30, 000 training samples, 7500 validation samples, and 37, 500 test samples.
Hardware Specification	No	The paper does not specify any hardware details like GPU/CPU models, memory, or specific cloud instances used for running the experiments.
Software Dependencies	No	The paper mentions 'Pytorch privacy framework, Opacus' but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We set the private batch size Kpriv 500, public batch size Kpub 200, and iterations T 5000. All algorithms undergo extensive hyperparameter tuning using the validation dataset... and See Tables 1 and 2 in Appendix G.2 for detailed descriptions of the hyperparameter search grids.