reproducibilityindex.ai

Enabling Fast Differentially Private SGD via Just-in-Time Compilation and Vectorization

Authors: Pranav Subramani, Nicholas Vadivelu, Gautam Kamath

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We thoroughly demonstrate that by exploiting powerful language primitives, including vectorization, just-in-time compilation, and static graph optimization, one can dramatically reduce these overheads, in many cases nearly matching the best nonprivate running times. These gains are realized in two frameworks: one is JAX, which provides rich support for these primitives through the XLA compiler. We also rebuild core parts of Tensor Flow Privacy, integrating more effective vectorization as well as XLA compilation, granting signiﬁcant memory and runtime improvements over previous release versions. Our proposed approaches allow us to achieve up to 50x speedups compared to the best alternatives. Table 1 summarizes some of our experimental results, with median running time per epoch for a variety of settings. JAX and Custom TFP are consistently the fastest. We perform an ablation study (Table 2) for all models to pinpoint the source of all improvements.
Researcher Affiliation	Academia	Pranav Subramani Cheriton School of Computer Science University of Waterloo pranav.subramani@uwaterloo.ca Nicholas Vadivelu Cheriton School of Computer Science University of Waterloo nbvadive@uwaterloo.ca Gautam Kamath Cheriton School of Computer Science University of Waterloo g@csail.mit.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/The Salon/fast-dpsgd.
Open Datasets	Yes	We evaluate the aforementioned implementations of DPSGD in runtime and memory consumption on three datasets: CIFAR10 [31], a dataset of small colour images with 60,000 training examples of size 32 32 3 each, IMDb [36], a movie review sentiment classiﬁcation dataset with 25,000 training examples padded to a sequence length of 256 each, and Adult [15], containing 45,220 examples with 104 features, which was preprocessed via methods from [29]. These datasets are available for open use and do not contain personally identiﬁable information or offensive content.
Dataset Splits	No	The paper mentions the total number of training examples for each dataset (e.g., '60,000 training examples' for CIFAR10, '25,000 training examples' for IMDb, and '45,220 examples' for Adult), but does not specify the explicit percentages, sample counts, or methodology for training/validation/test splits.
Hardware Specification	Yes	All experiments were run on Ubuntu 18.04 with an Intel Core i7-7800X CPU (3.50 GHz, 6 cores), NVIDIA GTX Titan V GPU (12GB VRAM), and 32GB of RAM.
Software Dependencies	No	The paper mentions software components like JAX, Tensor Flow 2, Py Torch, Opacus, and XLA, along with the operating system 'Ubuntu 18.04'. However, it does not provide specific version numbers for the programming languages, libraries, or frameworks used (e.g., 'Python 3.x', 'PyTorch 1.x', 'TensorFlow 2.x').
Experiment Setup	Yes	These architectures and datasets are evaluated in terms of runtime at batch sizes 16, 32, 64, 128, and 256. Each experiment was run for 20 epochs and the median epoch run-time is reported. We start with the smallest dataset, Adult, training a 5,532-parameter fully-connected neural network (FCNN). Then, we train a CIFAR10 convolutional neural network classiﬁer architecture with 605,226 parameters... For IMDb, we use an LSTM network with 1,081,002 parameters...