Differentially Private Adaptive Optimization with Delayed Preconditioners

Authors: Tian Li, Manzil Zaheer, Ken Liu, Sashank J. Reddi, Hugh Brendan McMahan, Virginia Smith

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To better understand these performance gains, we theoretically and empirically analyze the method to study the effect of using delayed preconditioners, including trade-offs that emerge between the noise reduction and staleness. We conduct extensive experiments to showcase the effectiveness of DP2, which can significantly improve model utility for a given privacy budget across text and recommendation benchmarks.
Researcher Affiliation Collaboration Carnegie Mellon University, Google Deep Mind, Google Research {litian,ziyuliu,smithv}@cs.cmu.edu, {manzilzaheer,sashank,mcmahan}@google.com
Pseudocode Yes Algorithm 1: DP2-RMSprop: Delayed Preconditioners for Differentially Private RMSprop
Open Source Code Yes Our code is publicly available at github.com/kenziyuliu/DP2.
Open Datasets Yes IMDB (Maas et al., 2011) is a binary classification dataset on sentiment analysis for movie reviews that includes 25,000/25,000 training/test samples. Stack Overflow (Kaggle, 2022; Tensor Flow Federated, 2022)... We randomly sample 246,092 sentences for training and 61,719 for testing... Movie Lens-100k (Harper & Konstan, 2015)... We randomly partition them for training and evaluation.
Dataset Splits No The paper specifies training and testing/evaluation partitions but does not explicitly describe a separate validation set with specific sizes or percentages. It mentions hyperparameters are selected using 'training set metrics'.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running experiments.
Software Dependencies No Our experiments are implemented in JAX (Bradbury et al., 2018) with Haiku (Hennigan et al., 2020) to auto-vectorize over the perexample operations (e.g. per-example clipping) for substantial speedups (Subramani et al., 2021). The versions for JAX and Haiku are not specified.
Experiment Setup Yes Unless explicitly stated, we report results with the best grid-searched hyperparameters. Note that for DP2 we tune the learning rates and clipping thresholds separately for private SGD iterations and private adaptive (RMSProp) iterations. See Appendix C.2 for hyperparameter details. In Appendix C.2, specific ranges for learning rates, clipping thresholds, and the delay parameter 's' are provided (e.g., 'Learning rates: We grid search over {0.03, 0.1, 0.3, 1, 3, 5} for SGD / Ada Grad update rules and from {0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3} for the RMSProp update rule.').