Fast and Memory Efficient Differentially Private-SGD via JL Projections
Authors: Zhiqi Bu, Sivakanth Gopi, Janardhan Kulkarni, Yin Tat Lee, Hanwen Shen, Uthaipon Tantipongpipat
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate experimentally that compared to existing implementations of DP-SGD with exact per-sample gradient clipping, our optimizers have significant advantages in speed and memory cost while achieving comparable accuracy-vs-privacy tradeoff. |
| Researcher Affiliation | Collaboration | Zhiqi Bu University of Pennsylvania zbu@sas.upenn.edu Sivakanth Gopi Microsoft Research sigopi@microsoft.com Janardhan Kulkarni Microsoft Research jakul@microsoft.com Yin Tat Lee University of Washington yintat@uw.edu Judy Hanwen Shen Stanford University jhshen@stanford.edu Uthaipon Tantipongpipat Twitter uthaipon@gmail.com |
| Pseudocode | Yes | Algorithm 1: Differentially private SGD using JL projections (DP-SGD-JL) |
| Open Source Code | Yes | The code for our experiments is available in the supplementary material. |
| Open Datasets | Yes | on the IMDb dataset for sentiment analysis. We train the same single-layer bidirectional LSTM7 as in the [Ten] tutorial, using the same IMDb dataset with 8k vocabulary. We train a convolutional neural network from [TP] tutorial on MNIST dataset, which has 60,000 training samples. |
| Dataset Splits | No | The paper mentions training data size (e.g., '25,000 training samples' for IMDb, '60,000 training samples' for MNIST) but does not provide explicit train/validation/test dataset splits or their percentages/counts. |
| Hardware Specification | Yes | We use one Tesla P100 16GB GPU for all experiments. |
| Software Dependencies | Yes | We use Tensorflow and [TP] for all our experiments because [Opa] does not support arbitrary network architectures. Moreover Tensorflow has an efficient implementation of jvp while Py Torch doesn t. Supported in tf-nightly 2.4.0.dev20200924 as tf.autodiff.Forward Accumulator(θ,v).jvp(F). JAX also has an implementation of jvp. |
| Experiment Setup | Yes | We set β1 = 0.9, β2 = 0.999, σ = 0.6, C = 1, B = 256, η = 0.001, E = 15. We use Adam as the optimizer. |