Differentially Private Learning Needs Better Features (or Much More Data)
Authors: Florian Tramer, Dan Boneh
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that differentially private machine learning has not yet reached its Alex Net moment on many canonical vision tasks: linear models trained on handcrafted features significantly outperform end-to-end deep neural networks for moderate privacy budgets. To exceed the performance of handcrafted features, we show that private learning requires either much more private data, or access to features learned on public data from a similar domain. Our work introduces simple yet strong baselines for differentially private learning that can inform the evaluation of future progress in this area. |
| Researcher Affiliation | Academia | Florian Tramèr Stanford University tramer@cs.stanford.edu Dan Boneh Stanford University dabo@cs.stanford.edu |
| Pseudocode | Yes | Algorithm 1: DP-SGD (Abadi et al., 2016) and Algorithm 2: Private Data Normalization are present in Appendix B. |
| Open Source Code | Yes | Code to reproduce our experiments is available at https://github.com/ftramer/Handcrafted-DP. |
| Open Datasets | Yes | We compare differentially private Scatter Net classifiers and deep learning models on MNIST (Le Cun et al., 2010), Fashion-MNIST (Xiao et al., 2017) and CIFAR-10 (Krizhevsky, 2009). |
| Dataset Splits | Yes | We compare differentially private Scatter Net classifiers and deep learning models on MNIST (Le Cun et al., 2010), Fashion-MNIST (Xiao et al., 2017) and CIFAR-10 (Krizhevsky, 2009). Most prior works find the best model for a given DP budget using a hyper-parameter search. To evaluate Scatter Net models, we apply the following hyper-parameter search: We begin by fixing a privacy schedule. We target a moderate differential privacy budget of (ε = 3, δ = 10 5) and compute the noise scale σ of DP-SGD so that the privacy budget is consumed after T epochs. We try different values of T, with larger values resulting in training for more steps but with higher noise. |
| Hardware Specification | Yes | We use a NVIDIA Titan Xp GPU with 12GB of RAM for all our experiments. |
| Software Dependencies | No | The paper mentions software used like Kymatio (Andreux et al., 2020) and opacus (pytorch/opacus, 2020), but does not provide specific version numbers for these software components or their underlying frameworks (e.g., PyTorch). |
| Experiment Setup | Yes | We begin by fixing a privacy schedule. We target a moderate differential privacy budget of (ε = 3, δ = 10 5) and compute the noise scale σ of DP-SGD so that the privacy budget is consumed after T epochs. We try different values of T, with larger values resulting in training for more steps but with higher noise. We fix the gradient clipping threshold for DP-SGD to C = 0.1 for all our experiments. We try various batch sizes B and base learning rates η, with linear learning rate scaling (Goyal et al., 2017). We try both Group Normalization (Wu & He, 2018) with different choices for the number of groups, and private Data Normalization with different choices of privacy budgets (see Appendix B for details). We perform a grid-search over all parameters as detailed in Appendix C.5. Table 12: Hyper-parameters for the evaluation of private linear classifiers fine-tuned on Scatter Net features, CNNs fine-tuned on Scatter Net features, and end-to-end CNNs in Section 3. |