reproducibilityindex.ai

Practical and Private (Deep) Learning Without Sampling or Shuffling

Authors: Peter Kairouz, Brendan Mcmahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, Zheng Xu

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We design and analyze a DP variant of Follow-The Regularized-Leader (DP-FTRL) that compares favorably (both theoretically and empirically) to ampliﬁed DP-SGD, while allowing for much more ﬂexible data access patterns. DP-FTRL does not use any form of privacy ampliﬁcation. In Section 5, we study some trade-offs between privacy/utility/computation for DPFTRL and DP-SGD. We conduct our experiments on four benchmark data sets: MNIST, CIFAR-10, EMNIST, and Stack Overﬂow. We start by ﬁxing the computation available to the techniques, and observing privacy/utility tradeoffs.
Researcher Affiliation	Industry	Peter Kairouz 1 Brendan Mc Mahan 1 Shuang Song 1 Om Thakkar 1 Abhradeep Thakurta 1 Zheng Xu 1 1Google.
Pseudocode	Yes	Algorithm 1 AFTRL: Differentially Private Follow-The Regularized-Leader (DP-FTRL)
Open Source Code	Yes	The code is open sourced6. 6https://github.com/google-research/ federated/tree/master/dp_ftrl for FL experiments, and https://github.com/google-research/ DP-FTRL for centralized learning.
Open Datasets	Yes	Datasets: We conduct our evaluation on three image classiﬁcation tasks, MNIST (Le Cun et al., 1998), CIFAR10 (Krizhevsky, 2009), EMNIST (By Merge split) (Cohen et al., 2017); and a next word prediction task on Stack Overﬂow data set (Overﬂow, 2018).
Dataset Splits	No	The paper mentions 'test accuracy' and uses standard benchmark datasets, but it does not explicitly provide specific details about the training, validation, and test splits (e.g., percentages or sample counts) used for reproduction.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, cloud instance types) used to run the experiments.
Software Dependencies	No	The paper mentions software like TensorFlow-privacy and PyTorch but does not provide specific version numbers for any software dependencies, which are required for reproducible software description.
Experiment Setup	Yes	For all experiments with DP, we set the privacy parameter δ to 10 5 on MNIST and CIFAR-10, and 10 6 on EMNIST and Stack Overﬂow... We ﬁx the (samples in mini-batch, training iterations) to (250, 4800) for MNIST, (500, 10000) for CIFAR-10, and (500, 69750) for EMNIST. Our goal is to achieve equal or better tradeoffs while processing data in an arbitrary order (i.e., without relying on any ampliﬁcation). ... DP-FTRLM with momentum 0.9 always outperforms DP-FTRL.