Generative Models for Effective ML on Private, Decentralized Datasets
Authors: Sean Augenstein, H. Brendan McMahan, Daniel Ramage, Swaroop Ramaswamy, Peter Kairouz, Mingqing Chen, Rajiv Mathews, Blaise Aguera y Arcas
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper demonstrates that generative models trained using federated methods and with formal differential privacy guarantees can be used effectively to debug many commonly occurring data issues even when the data cannot be directly inspected. We explore these methods in applications to text with differentially private federated RNNs and to images using a novel algorithm for differentially private federated GANs. |
| Researcher Affiliation | Industry | Sean Augenstein Google Inc. saugenst@google.com H. Brendan Mc Mahan Google Inc. mcmahan@google.com Daniel Ramage Google Inc. dramage@google.com Swaroop Ramaswamy Google Inc. swaroopram@google.com Peter Kairouz Google Inc. kairouz@google.com Mingqing Chen Google Inc. mingqing@google.com Rajiv Mathews Google Inc. mathews@google.com Blaise Aguera y Arcas Google Inc. blaisea@google.com |
| Pseudocode | Yes | Algorithm 1 ( DP-Fed Avg-GAN ) describes how to train a GAN under FL and DP. ... Algorithm 2: DP-Fed Avg with fixed-size federated rounds, used to train wordand char-LMs in Section 5. |
| Open Source Code | Yes | As an initial step to stimulate research in this area, we provide an open-source implementation6 of our DP federated GAN code (used to generate the results in Section 6). 6https://github.com/tensorflow/federated/tree/master/tensorflow federated/python/research/gans |
| Open Datasets | Yes | We simulate the scenario using a dataset derived from Stack Overflow questions and answers (hosted by Tensor Flow Federated (Ingerman & Ostrowski, 2019)). It provides a realistic proxy for federated data, since we can associate all posts by an author as a particular user (see Appendix B.1 for details). ... The dataset is available via the Tensorflow Federated open-source software framework (Ingerman & Ostrowski, 2019). ... We use the Federated EMNIST dataset (Caldas et al., 2018). ... It is available via the Tensorflow Federated open-source software framework (Ingerman & Ostrowski, 2019). |
| Dataset Splits | Yes | The corpus is divided into train, held-out, and test parts. Table 4 summarizes the statistics on number of users and number of sentences for each partition. |
| Hardware Specification | No | The paper describes experiments run on 'devices' or 'mobile phones' in a federated learning setting but does not specify the hardware (e.g., GPU, CPU models, or cloud instance types) used to conduct their simulations or training. |
| Software Dependencies | No | The paper mentions 'Tensor Flow Federated' and a GAN code tutorial, but does not provide specific version numbers for software dependencies like TensorFlow Federated itself, or other libraries such as PyTorch or CUDA. |
| Experiment Setup | Yes | The model is trained for 2,000 rounds. A server learning rate of 1.0, an on-device learning rate of 0.5, and Nesterov momentum of 0.99 are used. ... Privacy Hyperparameters Table 5 gives the privacy hyperparameters used with DP-Fed Avg (Algorithm 2). ... Hyperparameters Table 9 gives the various hyperparameters used with the DP-Fed Avg-GAN algorithm to yield generators that produced the images displayed in Figures 3, 6, 7, and 8. |