reproducibilityindex.ai

Generative Models for Effective ML on Private, Decentralized Datasets

Authors: Sean Augenstein, H. Brendan McMahan, Daniel Ramage, Swaroop Ramaswamy, Peter Kairouz, Mingqing Chen, Rajiv Mathews, Blaise Aguera y Arcas

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper demonstrates that generative models trained using federated methods and with formal differential privacy guarantees can be used effectively to debug many commonly occurring data issues even when the data cannot be directly inspected. We explore these methods in applications to text with differentially private federated RNNs and to images using a novel algorithm for differentially private federated GANs.
Researcher Affiliation	Industry	Sean Augenstein Google Inc. saugenst@google.com H. Brendan Mc Mahan Google Inc. mcmahan@google.com Daniel Ramage Google Inc. dramage@google.com Swaroop Ramaswamy Google Inc. swaroopram@google.com Peter Kairouz Google Inc. kairouz@google.com Mingqing Chen Google Inc. mingqing@google.com Rajiv Mathews Google Inc. mathews@google.com Blaise Aguera y Arcas Google Inc. blaisea@google.com
Pseudocode	Yes	Algorithm 1 ( DP-Fed Avg-GAN ) describes how to train a GAN under FL and DP. ... Algorithm 2: DP-Fed Avg with ﬁxed-size federated rounds, used to train wordand char-LMs in Section 5.
Open Source Code	Yes	As an initial step to stimulate research in this area, we provide an open-source implementation6 of our DP federated GAN code (used to generate the results in Section 6). 6https://github.com/tensorﬂow/federated/tree/master/tensorﬂow federated/python/research/gans
Open Datasets	Yes	We simulate the scenario using a dataset derived from Stack Overﬂow questions and answers (hosted by Tensor Flow Federated (Ingerman & Ostrowski, 2019)). It provides a realistic proxy for federated data, since we can associate all posts by an author as a particular user (see Appendix B.1 for details). ... The dataset is available via the Tensorﬂow Federated open-source software framework (Ingerman & Ostrowski, 2019). ... We use the Federated EMNIST dataset (Caldas et al., 2018). ... It is available via the Tensorﬂow Federated open-source software framework (Ingerman & Ostrowski, 2019).
Dataset Splits	Yes	The corpus is divided into train, held-out, and test parts. Table 4 summarizes the statistics on number of users and number of sentences for each partition.
Hardware Specification	No	The paper describes experiments run on 'devices' or 'mobile phones' in a federated learning setting but does not specify the hardware (e.g., GPU, CPU models, or cloud instance types) used to conduct their simulations or training.
Software Dependencies	No	The paper mentions 'Tensor Flow Federated' and a GAN code tutorial, but does not provide specific version numbers for software dependencies like TensorFlow Federated itself, or other libraries such as PyTorch or CUDA.
Experiment Setup	Yes	The model is trained for 2,000 rounds. A server learning rate of 1.0, an on-device learning rate of 0.5, and Nesterov momentum of 0.99 are used. ... Privacy Hyperparameters Table 5 gives the privacy hyperparameters used with DP-Fed Avg (Algorithm 2). ... Hyperparameters Table 9 gives the various hyperparameters used with the DP-Fed Avg-GAN algorithm to yield generators that produced the images displayed in Figures 3, 6, 7, and 8.