Optimal Differentially Private Model Training with Public Data
Authors: Andrew Lowy, Zeman Li, Tianjian Huang, Meisam Razaviyayn
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, our algorithms show benefits over the state-of-the-art. Our experiments show that our algorithms outperform the na ıve approaches, even when the optimal DP algorithm is pre-trained on the public data. For example, our Algorithm 1 achieves a significant improvement in CIFAR-10 image classification tasks, reducing test error by 8 9% for logistic regression and by at most 18.9% for Wide-Res Net, compared with the na ıve approaches. and 4. Numerical Experiments section. |
| Researcher Affiliation | Academia | 1University of Wisconsin-Madison, Wisconsin Institute of Discovery, Madison, WI, USA 2Department of Industrial & Systems Engineering, University of Southern California, Los Angeles, CA, USA. |
| Pseudocode | Yes | Algorithm 1 Semi-DP-SGD via Weighted-Gaussian Gradient Estimation on page 5. |
| Open Source Code | Yes | Code for all of the experiments is available here: https://github.com/optimization-for-data-driven-science/DP-with-public-data. |
| Open Datasets | Yes | We evaluate the performance of Algorithm 1 in training a logistic regression model to classify digits in the CIFAR-10 dataset (Krizhevsky et al., 2009). |
| Dataset Splits | Yes | Our synthetic dataset consists of n 30, 000 training samples, 7500 validation samples, and 37, 500 test samples. |
| Hardware Specification | No | The paper does not specify any hardware details like GPU/CPU models, memory, or specific cloud instances used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Pytorch privacy framework, Opacus' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We set the private batch size Kpriv 500, public batch size Kpub 200, and iterations T 5000. All algorithms undergo extensive hyperparameter tuning using the validation dataset... and See Tables 1 and 2 in Appendix G.2 for detailed descriptions of the hyperparameter search grids. |