reproducibilityindex.ai

Differentially Private Adaptive Optimization with Delayed Preconditioners

Authors: Tian Li, Manzil Zaheer, Ken Liu, Sashank J. Reddi, Hugh Brendan McMahan, Virginia Smith

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To better understand these performance gains, we theoretically and empirically analyze the method to study the effect of using delayed preconditioners, including trade-offs that emerge between the noise reduction and staleness. We conduct extensive experiments to showcase the effectiveness of DP2, which can significantly improve model utility for a given privacy budget across text and recommendation benchmarks.
Researcher Affiliation	Collaboration	Carnegie Mellon University, Google Deep Mind, Google Research {litian,ziyuliu,smithv}@cs.cmu.edu, {manzilzaheer,sashank,mcmahan}@google.com
Pseudocode	Yes	Algorithm 1: DP2-RMSprop: Delayed Preconditioners for Differentially Private RMSprop
Open Source Code	Yes	Our code is publicly available at github.com/kenziyuliu/DP2.
Open Datasets	Yes	IMDB (Maas et al., 2011) is a binary classification dataset on sentiment analysis for movie reviews that includes 25,000/25,000 training/test samples. Stack Overflow (Kaggle, 2022; Tensor Flow Federated, 2022)... We randomly sample 246,092 sentences for training and 61,719 for testing... Movie Lens-100k (Harper & Konstan, 2015)... We randomly partition them for training and evaluation.
Dataset Splits	No	The paper specifies training and testing/evaluation partitions but does not explicitly describe a separate validation set with specific sizes or percentages. It mentions hyperparameters are selected using 'training set metrics'.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running experiments.
Software Dependencies	No	Our experiments are implemented in JAX (Bradbury et al., 2018) with Haiku (Hennigan et al., 2020) to auto-vectorize over the perexample operations (e.g. per-example clipping) for substantial speedups (Subramani et al., 2021). The versions for JAX and Haiku are not specified.
Experiment Setup	Yes	Unless explicitly stated, we report results with the best grid-searched hyperparameters. Note that for DP2 we tune the learning rates and clipping thresholds separately for private SGD iterations and private adaptive (RMSProp) iterations. See Appendix C.2 for hyperparameter details. In Appendix C.2, specific ranges for learning rates, clipping thresholds, and the delay parameter 's' are provided (e.g., 'Learning rates: We grid search over {0.03, 0.1, 0.3, 1, 3, 5} for SGD / Ada Grad update rules and from {0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3} for the RMSProp update rule.').