reproducibilityindex.ai

Private Adaptive Gradient Methods for Convex Optimization

Authors: Hilal Asi, John Duchi, Alireza Fallah, Omid Javidbakht, Kunal Talwar

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conclude the paper with several experiments to demonstrate the performance of PAGAN and PASAN algorithms. We perform experiments both on synthetic data, where we may control all aspects of the experiment, and a real-world example training large-scale private language models.
Researcher Affiliation	Collaboration	Hilal Asi * 1 2 John Duchi 2 3 Alireza Fallah 4 1 Omid Javidbakht 5 Kunal Talwar 5. 1Work done while interning at Apple 2Department of Electrical Engineering, Stanford University 3Department of Statistics, Stanford University 4Department of Electrical Engineering & Computer Science, MIT 5Apple.
Pseudocode	Yes	Algorithm 1 Private Adaptive SGD with Adaptive Noise (PASAN). Algorithm 2 Private Adagrad with Adaptive Noise (PAGAN). Algorithm 3 Private Second Moment Estimation.
Open Source Code	Yes	The code is available online2. 2https://github.com/apple/ ml-private-adaptive-gradient-methods
Open Datasets	Yes	We train a variant of a recurrent neural network with Long Short-Term-Memory (LSTM) (Hochreiter & Schmidhuber, 1997) on the Wiki Text-2 dataset (Merity et al., 2017), which is split into train, validation, and test sets.
Dataset Splits	Yes	We train a variant of a recurrent neural network with Long Short-Term-Memory (LSTM) (Hochreiter & Schmidhuber, 1997) on the Wiki Text-2 dataset (Merity et al., 2017), which is split into train, validation, and test sets. We further split the train set to 59,674 data points, where each data point has 35 tokens.
Hardware Specification	No	The paper mentions 'a standard workstation without any accelerators' for hyper-parameter tuning, but does not provide specific hardware details (e.g., CPU/GPU models, memory) for the main experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup	Yes	In our experiments, we use the parameters n = 5000, d = 100, σj = j 3/2, τ = 0.01, and the batch size for all methods is b = 70. As optimization methods are sensitive to stepsize choice in general (even non-privately (Asi & Duchi, 2019)), we run each method with different values of initial stepsize in {0.005, 0.01, 0.05, 0.1, 0.15, 0.2, 0.4, 0.5, 1.0} to ﬁnd the best stepsize value. ...we perform a hyper-parameter search over three algorithm-speciﬁc constants: a multiplier α {0.1, 0.2, 0.4, 0.8, 1.0, 10.0, 50.0} for step-size, mini-batch size b {50, 100, 150, 200, 250}, and projection threshold B {0.05, 0.1, 0.5, 1.0}.