Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Fast and Memory Efficient Differentially Private-SGD via JL Projections
Authors: Zhiqi Bu, Sivakanth Gopi, Janardhan Kulkarni, Yin Tat Lee, Hanwen Shen, Uthaipon Tantipongpipat
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate experimentally that compared to existing implementations of DP-SGD with exact per-sample gradient clipping, our optimizers have significant advantages in speed and memory cost while achieving comparable accuracy-vs-privacy tradeoff. |
| Researcher Affiliation | Collaboration | Zhiqi Bu University of Pennsylvania EMAIL Sivakanth Gopi Microsoft Research EMAIL Janardhan Kulkarni Microsoft Research EMAIL Yin Tat Lee University of Washington EMAIL Judy Hanwen Shen Stanford University EMAIL Uthaipon Tantipongpipat Twitter EMAIL |
| Pseudocode | Yes | Algorithm 1: Differentially private SGD using JL projections (DP-SGD-JL) |
| Open Source Code | Yes | The code for our experiments is available in the supplementary material. |
| Open Datasets | Yes | on the IMDb dataset for sentiment analysis. We train the same single-layer bidirectional LSTM7 as in the [Ten] tutorial, using the same IMDb dataset with 8k vocabulary. We train a convolutional neural network from [TP] tutorial on MNIST dataset, which has 60,000 training samples. |
| Dataset Splits | No | The paper mentions training data size (e.g., '25,000 training samples' for IMDb, '60,000 training samples' for MNIST) but does not provide explicit train/validation/test dataset splits or their percentages/counts. |
| Hardware Specification | Yes | We use one Tesla P100 16GB GPU for all experiments. |
| Software Dependencies | Yes | We use Tensorflow and [TP] for all our experiments because [Opa] does not support arbitrary network architectures. Moreover Tensorflow has an efficient implementation of jvp while Py Torch doesn t. Supported in tf-nightly 2.4.0.dev20200924 as tf.autodiff.Forward Accumulator(θ,v).jvp(F). JAX also has an implementation of jvp. |
| Experiment Setup | Yes | We set β1 = 0.9, β2 = 0.999, σ = 0.6, C = 1, B = 256, η = 0.001, E = 15. We use Adam as the optimizer. |