reproducibilityindex.ai

Why Is Public Pretraining Necessary for Private Model Training?

Authors: Arun Ganesh, Mahdi Haghifam, Milad Nasr, Sewoong Oh, Thomas Steinke, Om Thakkar, Abhradeep Guha Thakurta, Lun Wang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Further, systematic experiments on CIFAR10 and Librispeech provide supporting evidence for our hypothesis.
Researcher Affiliation	Collaboration	1Google. 2University of Toronto. Part of this work done while the author was an intern at Google. 3University of Washington.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete statements or links regarding the availability of its source code.
Open Datasets	Yes	Systematic experiments on CIFAR10 and Librispeech provide supporting evidence for our hypothesis... We train a Conformer M (Gulati et al., 2020) model on Librispeech (Panayotov et al., 2015) dataset... We split CIFAR10 (60,000 images) into a public dataset of size 2,000 and a private dataset of size 58,000.
Dataset Splits	Yes	We split CIFAR10 (60,000 images) into a public dataset of size 2,000 and a private dataset of size 58,000.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions 'Adam optimizer' but does not specify version numbers for any software libraries or dependencies.
Experiment Setup	Yes	We train for 60 epochs with a clipping norm of one, learning rate of 0.001, batch size of 256, and Adam optimizer. Simulating an ID public data setting, we split CIFAR10 (60,000 images) into a public dataset of size 2,000 and a private dataset of size 58,000. We use Adam optimizer with learning rate of 0.002 for the public dataset... We train a Conformer M model on the complete Librispeech dataset for 100k steps... We train a Conformer M model on 90% samples drawn uniformly from the Librispeech dataset using DP-Adam for 20k steps... We pretrain a Conformer M model on the 10% of the samples with Adam for 10k steps and then ﬁne-tune on the remaining 90% samples with privacy for 1k steps. Note that the hyper-parameters for the latter two settings are tuned to optimize the test word error rate under the same privacy budget ε = 9.8. We ﬁx the privacy parameter δ to 1e-6, ensuring that δ < n 1, where n is the number of private samples.