Practical and Private (Deep) Learning Without Sampling or Shuffling
Authors: Peter Kairouz, Brendan Mcmahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, Zheng Xu
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We design and analyze a DP variant of Follow-The Regularized-Leader (DP-FTRL) that compares favorably (both theoretically and empirically) to amplified DP-SGD, while allowing for much more flexible data access patterns. DP-FTRL does not use any form of privacy amplification. In Section 5, we study some trade-offs between privacy/utility/computation for DPFTRL and DP-SGD. We conduct our experiments on four benchmark data sets: MNIST, CIFAR-10, EMNIST, and Stack Overflow. We start by fixing the computation available to the techniques, and observing privacy/utility tradeoffs. |
| Researcher Affiliation | Industry | Peter Kairouz 1 Brendan Mc Mahan 1 Shuang Song 1 Om Thakkar 1 Abhradeep Thakurta 1 Zheng Xu 1 1Google. |
| Pseudocode | Yes | Algorithm 1 AFTRL: Differentially Private Follow-The Regularized-Leader (DP-FTRL) |
| Open Source Code | Yes | The code is open sourced6. 6https://github.com/google-research/ federated/tree/master/dp_ftrl for FL experiments, and https://github.com/google-research/ DP-FTRL for centralized learning. |
| Open Datasets | Yes | Datasets: We conduct our evaluation on three image classification tasks, MNIST (Le Cun et al., 1998), CIFAR10 (Krizhevsky, 2009), EMNIST (By Merge split) (Cohen et al., 2017); and a next word prediction task on Stack Overflow data set (Overflow, 2018). |
| Dataset Splits | No | The paper mentions 'test accuracy' and uses standard benchmark datasets, but it does not explicitly provide specific details about the training, validation, and test splits (e.g., percentages or sample counts) used for reproduction. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, cloud instance types) used to run the experiments. |
| Software Dependencies | No | The paper mentions software like TensorFlow-privacy and PyTorch but does not provide specific version numbers for any software dependencies, which are required for reproducible software description. |
| Experiment Setup | Yes | For all experiments with DP, we set the privacy parameter δ to 10 5 on MNIST and CIFAR-10, and 10 6 on EMNIST and Stack Overflow... We fix the (samples in mini-batch, training iterations) to (250, 4800) for MNIST, (500, 10000) for CIFAR-10, and (500, 69750) for EMNIST. Our goal is to achieve equal or better tradeoffs while processing data in an arbitrary order (i.e., without relying on any amplification). ... DP-FTRLM with momentum 0.9 always outperforms DP-FTRL. |