reproducibilityindex.ai

Scalable DP-SGD: Shuffling vs. Poisson Subsampling

Authors: Lynn Chua, Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide new lower bounds on the privacy guarantee... To understand the impact of this gap on the utility of trained machine learning models, we introduce a practical approach... We compare the utility of models trained with Poisson-subsampling-based DP-SGD... Our detailed experimental results are presented in Section 4...
Researcher Affiliation	Industry	Lynn Chua Google Research chualynn@google.com Badih Ghazi Google Research badihghazi@gmail.com Pritish Kamath Google Research pritishk@google.com Ravi Kumar Google Research ravi.k53@gmail.com Pasin Manurangsi Google Research pasin@google.com Amer Sinha Google Research amersinha@google.com Chiyuan Zhang Google Research chiyuan@google.com
Pseudocode	Yes	Algorithm 1 DP-SGD: Differentially Private Stochastic Gradient Descent [Abadi et al., 2016], Algorithm 2 ABLQB: Adaptive Batch Linear Queries (as formalized in Chua et al. [2024]), Algorithm 3 Πb,T (n; π): Permutation Batch Sampler, Algorithm 4 Pb,B,T : Truncated Poisson Batch Sampler, and code snippet in Appendix A.
Open Source Code	Yes	We provide the implementation of our privacy accounting methods described above in an i Python notebook5 hosted on Google Colab, executable using the freely available Python CPU runtime.
Open Datasets	Yes	We run our experiments on the Criteo Display Ads p CTR Dataset [Jean-Baptiste Tien, 2014]
Dataset Splits	Yes	We use the labeled training set from the dataset, split chronologically into a 80%/10%/10% partition of train/validation/test sets.
Hardware Specification	Yes	The training is done using NVIDIA Tesla P100 GPUs, where each epoch of training takes 1-2 hours on a single GPU.
Software Dependencies	No	The paper mentions software like "Tensorflow Privacy, JAX Privacy [Balle et al., 2022] and Py Torch Opacus [Yousefpour et al., 2021]", "Apache Beam", "Apache Flink, Apache Spark, or Google Cloud Dataflow", and imports `apache_beam`, `numpy`, `tensorflow` in a code snippet. However, specific version numbers for these software packages are not provided.
Experiment Setup	Yes	We use a neural network with five layers and 78M parameters as the model. The first layer consists of feature transforms for each of the categorical and integer features. Categorical features are mapped into dense feature vectors using an embedding layer, where the embedding dimensions are fixed at 48. We apply a log transform for the remaining integer features, and concatenate all the features together. The next three layers are fully connected layers with 598 hidden units each and a Re LU activation function. The last layer consists of a fully connected layer which gives a scalar logit prediction. We use the Adam or Adagrad optimizer with a base learning rate in {0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1}, which is scaled with a cosine decay, and we tune the norm bound C {1, 5, 10, 50}. For the experiments with varying batch sizes, we use batch sizes that are powers of 2 between 1 024 and 262 144, with corresponding maximum batch sizes B in {1 328, 2 469, 4 681, 9 007, 17 520, 34 355, 67 754, 134 172, 266 475}.