reproducibilityindex.ai

Private Adaptive Optimization with Side information

Authors: Tian Li, Manzil Zaheer, Sashank Reddi, Virginia Smith

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we leverage simple and readily available side information to explore the performance of Ada DPS in practice, comparing to strong baselines in both centralized and federated settings.
Researcher Affiliation	Collaboration	1Carnegie Mellon University 2Google Deep Mind 3Google Research
Pseudocode	Yes	Ada DPS in centralized training is summarized in Algorithm 1.
Open Source Code	Yes	Our code is publicly available at github.com/litian96/Ada DPS.
Open Datasets	Yes	Datasets. We consider common benchmarks for adaptive optimization in centralized or federated settings (Amid et al., 2021; Reddi et al., 2018a; 2021) involving varying types of models (both convex and non-convex) and data (both text and image data). Stack Overﬂow (Authors, 2019) consists of posts on the Stack Overﬂow website, where the task is tag prediction (500class classiﬁcation). IMDB (Maas et al., 2011) is widely used for for binary sentiment classiﬁcation of movie reviews, consisting of 25,000 training and 25,000 testing samples. MNIST (Le Cun et al., 1998) images with a deep autoencoder model (for image reconstruction) which has the same architecture as that in previous works (Reddi et al., 2018a) (containing more than 2 million parameters).
Dataset Splits	Yes	For non-private training experiments, we ﬁx the mini-batch size to 64, and tune ﬁxed learning rates by performing a grid search over {0.0005,0.001,0.005,0.01,0.05,0.1,0.2,0.5,1,2} separately for all methods on validation data. IMDB (Maas et al., 2011) is widely used for for binary sentiment classiﬁcation of movie reviews, consisting of 25,000 training and 25,000 testing samples.
Hardware Specification	No	The paper does not explicitly describe the hardware used for experiments (e.g., specific GPU/CPU models, memory details).
Software Dependencies	No	The paper does not provide specific version numbers for ancillary software dependencies (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	For non-private training experiments, we ﬁx the mini-batch size to 64, and tune ﬁxed learning rates by performing a grid search over {0.0005,0.001,0.005,0.01,0.05,0.1,0.2,0.5,1,2} separately for all methods on validation data. For differentially private training, the δ values in the privacy budget are always inverse of the number of training samples. We ﬁx the noise multiplier σ for each dataset, tune the clipping threshold, and compute the ﬁnal ε values. Speciﬁcally, the σ values are 1, 1, and 0.95 for IMDB (convex), IMDB (LSTM), and Stack Overﬂow; 1 and 0.75 for MNIST (autoencoder). The clipping threshold C (in Algorithm 1) is tuned from {0.01,0.02,0.05,0.1,0.2,0.5,1,2,3}, jointly with tuning the (ﬁxed) learning rates. The number of micro-batches is 16 for all related experiments, and the mini-batch size is 64 (i.e., we privatize each gradient averaged over 4 individual ones to speed up computation).