Practical and Private (Deep) Learning Without Sampling or Shuffling

Authors: Peter Kairouz, Brendan Mcmahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, Zheng Xu

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We design and analyze a DP variant of Follow-The Regularized-Leader (DP-FTRL) that compares favorably (both theoretically and empirically) to amplified DP-SGD, while allowing for much more flexible data access patterns. DP-FTRL does not use any form of privacy amplification. In Section 5, we study some trade-offs between privacy/utility/computation for DPFTRL and DP-SGD. We conduct our experiments on four benchmark data sets: MNIST, CIFAR-10, EMNIST, and Stack Overflow. We start by fixing the computation available to the techniques, and observing privacy/utility tradeoffs.
Researcher Affiliation Industry Peter Kairouz 1 Brendan Mc Mahan 1 Shuang Song 1 Om Thakkar 1 Abhradeep Thakurta 1 Zheng Xu 1 1Google.
Pseudocode Yes Algorithm 1 AFTRL: Differentially Private Follow-The Regularized-Leader (DP-FTRL)
Open Source Code Yes The code is open sourced6. 6https://github.com/google-research/ federated/tree/master/dp_ftrl for FL experiments, and https://github.com/google-research/ DP-FTRL for centralized learning.
Open Datasets Yes Datasets: We conduct our evaluation on three image classification tasks, MNIST (Le Cun et al., 1998), CIFAR10 (Krizhevsky, 2009), EMNIST (By Merge split) (Cohen et al., 2017); and a next word prediction task on Stack Overflow data set (Overflow, 2018).
Dataset Splits No The paper mentions 'test accuracy' and uses standard benchmark datasets, but it does not explicitly provide specific details about the training, validation, and test splits (e.g., percentages or sample counts) used for reproduction.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, cloud instance types) used to run the experiments.
Software Dependencies No The paper mentions software like TensorFlow-privacy and PyTorch but does not provide specific version numbers for any software dependencies, which are required for reproducible software description.
Experiment Setup Yes For all experiments with DP, we set the privacy parameter δ to 10 5 on MNIST and CIFAR-10, and 10 6 on EMNIST and Stack Overflow... We fix the (samples in mini-batch, training iterations) to (250, 4800) for MNIST, (500, 10000) for CIFAR-10, and (500, 69750) for EMNIST. Our goal is to achieve equal or better tradeoffs while processing data in an arbitrary order (i.e., without relying on any amplification). ... DP-FTRLM with momentum 0.9 always outperforms DP-FTRL.