Fast and Memory Efficient Differentially Private-SGD via JL Projections

Authors: Zhiqi Bu, Sivakanth Gopi, Janardhan Kulkarni, Yin Tat Lee, Hanwen Shen, Uthaipon Tantipongpipat

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we demonstrate experimentally that compared to existing implementations of DP-SGD with exact per-sample gradient clipping, our optimizers have significant advantages in speed and memory cost while achieving comparable accuracy-vs-privacy tradeoff.
Researcher Affiliation Collaboration Zhiqi Bu University of Pennsylvania zbu@sas.upenn.edu Sivakanth Gopi Microsoft Research sigopi@microsoft.com Janardhan Kulkarni Microsoft Research jakul@microsoft.com Yin Tat Lee University of Washington yintat@uw.edu Judy Hanwen Shen Stanford University jhshen@stanford.edu Uthaipon Tantipongpipat Twitter uthaipon@gmail.com
Pseudocode Yes Algorithm 1: Differentially private SGD using JL projections (DP-SGD-JL)
Open Source Code Yes The code for our experiments is available in the supplementary material.
Open Datasets Yes on the IMDb dataset for sentiment analysis. We train the same single-layer bidirectional LSTM7 as in the [Ten] tutorial, using the same IMDb dataset with 8k vocabulary. We train a convolutional neural network from [TP] tutorial on MNIST dataset, which has 60,000 training samples.
Dataset Splits No The paper mentions training data size (e.g., '25,000 training samples' for IMDb, '60,000 training samples' for MNIST) but does not provide explicit train/validation/test dataset splits or their percentages/counts.
Hardware Specification Yes We use one Tesla P100 16GB GPU for all experiments.
Software Dependencies Yes We use Tensorflow and [TP] for all our experiments because [Opa] does not support arbitrary network architectures. Moreover Tensorflow has an efficient implementation of jvp while Py Torch doesn t. Supported in tf-nightly 2.4.0.dev20200924 as tf.autodiff.Forward Accumulator(θ,v).jvp(F). JAX also has an implementation of jvp.
Experiment Setup Yes We set β1 = 0.9, β2 = 0.999, σ = 0.6, C = 1, B = 256, η = 0.001, E = 15. We use Adam as the optimizer.