Why Is Public Pretraining Necessary for Private Model Training?
Authors: Arun Ganesh, Mahdi Haghifam, Milad Nasr, Sewoong Oh, Thomas Steinke, Om Thakkar, Abhradeep Guha Thakurta, Lun Wang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Further, systematic experiments on CIFAR10 and Librispeech provide supporting evidence for our hypothesis. |
| Researcher Affiliation | Collaboration | 1Google. 2University of Toronto. Part of this work done while the author was an intern at Google. 3University of Washington. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete statements or links regarding the availability of its source code. |
| Open Datasets | Yes | Systematic experiments on CIFAR10 and Librispeech provide supporting evidence for our hypothesis... We train a Conformer M (Gulati et al., 2020) model on Librispeech (Panayotov et al., 2015) dataset... We split CIFAR10 (60,000 images) into a public dataset of size 2,000 and a private dataset of size 58,000. |
| Dataset Splits | Yes | We split CIFAR10 (60,000 images) into a public dataset of size 2,000 and a private dataset of size 58,000. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Adam optimizer' but does not specify version numbers for any software libraries or dependencies. |
| Experiment Setup | Yes | We train for 60 epochs with a clipping norm of one, learning rate of 0.001, batch size of 256, and Adam optimizer. Simulating an ID public data setting, we split CIFAR10 (60,000 images) into a public dataset of size 2,000 and a private dataset of size 58,000. We use Adam optimizer with learning rate of 0.002 for the public dataset... We train a Conformer M model on the complete Librispeech dataset for 100k steps... We train a Conformer M model on 90% samples drawn uniformly from the Librispeech dataset using DP-Adam for 20k steps... We pretrain a Conformer M model on the 10% of the samples with Adam for 10k steps and then fine-tune on the remaining 90% samples with privacy for 1k steps. Note that the hyper-parameters for the latter two settings are tuned to optimize the test word error rate under the same privacy budget ε = 9.8. We fix the privacy parameter δ to 1e-6, ensuring that δ < n 1, where n is the number of private samples. |