Pre-training Differentially Private Models with Limited Public Data
Authors: Zhiqi Bu, Xinwei Zhang, Sheng Zha, Mingyi Hong, George Karypis
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, using only 10% of public data and 90% of private data, our strategy can achieve DP accuracy of 41.5% on Image Net-21k (with ϵ = 8), as well as non-DP accuracy of 55.7% and 60.0% on downstream tasks Places365 and i Naturalist-2021, respectively, on par with state-of-the-art standard pretraining and substantially outperforming existing DP pre-trained models. |
| Researcher Affiliation | Collaboration | Zhiqi Bu Amazon Xinwei Zhang University of Southern California Sheng Zha Amazon Mingyi Hong University of Minnesota George Karypis Amazon |
| Pseudocode | Yes | Algorithm 1 DP continual pre-training |
| Open Source Code | Yes | Our DP pre-trained models are released in fast DP library (https://github.com/ awslabs/fast-differential-privacy/releases/tag/v2.1). |
| Open Datasets | Yes | We use Image Net-1k (1.3M images, 1k classes; [25]) for public pretraining, then Image Net-11k (formally known as Image Net-21k-P5, 11M images, 11k classes; [70]) for private pre-training. |
| Dataset Splits | No | The paper mentions training and testing sets (e.g., '50,000 training and 10,000 test images' for CIFAR-10/100, and 'train:test=10.5M :0.52M' for Image Net-11k), but does not explicitly provide percentages or counts for a separate validation split for the primary model training. |
| Hardware Specification | No | The paper mentions 'multi-GPU distributed system' and 'GPU memory' but does not provide specific details on the CPU or GPU models used (e.g., NVIDIA A100, Tesla V100, Intel Xeon, etc.) or other detailed hardware specifications. |
| Software Dependencies | Yes | Our DP pre-trained models are released in fast DP library (https://github.com/ awslabs/fast-differential-privacy/releases/tag/v2.1). |
| Experiment Setup | Yes | We employ Adam W optimizer with batch size B = 4096 and learning rate η = 0.0002 set by the line search. |