Scalable DP-SGD: Shuffling vs. Poisson Subsampling
Authors: Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide new lower bounds on the privacy guarantee... To understand the impact of this gap on the utility of trained machine learning models, we introduce a practical approach... We compare the utility of models trained with Poisson-subsampling-based DP-SGD... Our detailed experimental results are presented in Section 4... |
| Researcher Affiliation | Industry | Lynn Chua Google Research chualynn@google.com Badih Ghazi Google Research badihghazi@gmail.com Pritish Kamath Google Research pritishk@google.com Ravi Kumar Google Research ravi.k53@gmail.com Pasin Manurangsi Google Research pasin@google.com Amer Sinha Google Research amersinha@google.com Chiyuan Zhang Google Research chiyuan@google.com |
| Pseudocode | Yes | Algorithm 1 DP-SGD: Differentially Private Stochastic Gradient Descent [Abadi et al., 2016], Algorithm 2 ABLQB: Adaptive Batch Linear Queries (as formalized in Chua et al. [2024]), Algorithm 3 Πb,T (n; π): Permutation Batch Sampler, Algorithm 4 Pb,B,T : Truncated Poisson Batch Sampler, and code snippet in Appendix A. |
| Open Source Code | Yes | We provide the implementation of our privacy accounting methods described above in an i Python notebook5 hosted on Google Colab, executable using the freely available Python CPU runtime. |
| Open Datasets | Yes | We run our experiments on the Criteo Display Ads p CTR Dataset [Jean-Baptiste Tien, 2014] |
| Dataset Splits | Yes | We use the labeled training set from the dataset, split chronologically into a 80%/10%/10% partition of train/validation/test sets. |
| Hardware Specification | Yes | The training is done using NVIDIA Tesla P100 GPUs, where each epoch of training takes 1-2 hours on a single GPU. |
| Software Dependencies | No | The paper mentions software like "Tensorflow Privacy, JAX Privacy [Balle et al., 2022] and Py Torch Opacus [Yousefpour et al., 2021]", "Apache Beam", "Apache Flink, Apache Spark, or Google Cloud Dataflow", and imports `apache_beam`, `numpy`, `tensorflow` in a code snippet. However, specific version numbers for these software packages are not provided. |
| Experiment Setup | Yes | We use a neural network with five layers and 78M parameters as the model. The first layer consists of feature transforms for each of the categorical and integer features. Categorical features are mapped into dense feature vectors using an embedding layer, where the embedding dimensions are fixed at 48. We apply a log transform for the remaining integer features, and concatenate all the features together. The next three layers are fully connected layers with 598 hidden units each and a Re LU activation function. The last layer consists of a fully connected layer which gives a scalar logit prediction. We use the Adam or Adagrad optimizer with a base learning rate in {0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1}, which is scaled with a cosine decay, and we tune the norm bound C {1, 5, 10, 50}. For the experiments with varying batch sizes, we use batch sizes that are powers of 2 between 1 024 and 262 144, with corresponding maximum batch sizes B in {1 328, 2 469, 4 681, 9 007, 17 520, 34 355, 67 754, 134 172, 266 475}. |