Why Is Public Pretraining Necessary for Private Model Training?

Authors: Arun Ganesh, Mahdi Haghifam, Milad Nasr, Sewoong Oh, Thomas Steinke, Om Thakkar, Abhradeep Guha Thakurta, Lun Wang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Further, systematic experiments on CIFAR10 and Librispeech provide supporting evidence for our hypothesis.
Researcher Affiliation Collaboration 1Google. 2University of Toronto. Part of this work done while the author was an intern at Google. 3University of Washington.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete statements or links regarding the availability of its source code.
Open Datasets Yes Systematic experiments on CIFAR10 and Librispeech provide supporting evidence for our hypothesis... We train a Conformer M (Gulati et al., 2020) model on Librispeech (Panayotov et al., 2015) dataset... We split CIFAR10 (60,000 images) into a public dataset of size 2,000 and a private dataset of size 58,000.
Dataset Splits Yes We split CIFAR10 (60,000 images) into a public dataset of size 2,000 and a private dataset of size 58,000.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions 'Adam optimizer' but does not specify version numbers for any software libraries or dependencies.
Experiment Setup Yes We train for 60 epochs with a clipping norm of one, learning rate of 0.001, batch size of 256, and Adam optimizer. Simulating an ID public data setting, we split CIFAR10 (60,000 images) into a public dataset of size 2,000 and a private dataset of size 58,000. We use Adam optimizer with learning rate of 0.002 for the public dataset... We train a Conformer M model on the complete Librispeech dataset for 100k steps... We train a Conformer M model on 90% samples drawn uniformly from the Librispeech dataset using DP-Adam for 20k steps... We pretrain a Conformer M model on the 10% of the samples with Adam for 10k steps and then fine-tune on the remaining 90% samples with privacy for 1k steps. Note that the hyper-parameters for the latter two settings are tuned to optimize the test word error rate under the same privacy budget ε = 9.8. We fix the privacy parameter δ to 1e-6, ensuring that δ < n 1, where n is the number of private samples.