Learning Differentially Private Recurrent Language Models
Authors: H. Brendan McMahan, Daniel Ramage, Kunal Talwar, Li Zhang
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our work demonstrates that given a dataset with a sufficiently large number of users (a requirement easily met by even small internet-scale datasets), achieving differential privacy comes at the cost of increased computation, rather than in decreased utility as in most prior work. We find that our private LSTM language models are quantitatively and qualitatively similar to un-noised models when trained on a large dataset. In extensive experiments in 3, we offer guidelines for parameter tuning when training complex models with differential privacy guarantees. |
| Researcher Affiliation | Industry | H. Brendan Mc Mahan mcmahan@google.com Daniel Ramage dramage@google.com Kunal Talwar kunal@google.com Li Zhang liqzhang@google.com |
| Pseudocode | Yes | The pseudocode for DP-Fed Avg and DP-Fed SGD is given as Algorithm 1. In the remainder of this section, we introduce estimators for C) and then different clipping strategies for B). Adding the sampling procedure from A) and noise added in D) allows us to apply the moments accountant to bound the total privacy loss of the algorithm, given in Theorem 1. Finally, we consider the properties of the moments accountant that make training on large datasets particular attractive. Algorithm 1: The main loop for DP-Fed Avg and DP-Fed SGD, the only difference being in the user update function (User Update Fed Avg or User Update Fed SGD). The calls on the moments accountant M refer to the API of Abadi et al. (2016b). |
| Open Source Code | No | The paper mentions using an implementation of the moments accountant from Abadi et al. (2016b) and provides its GitHub link, but it does not state that the code for the methodology described in *this* paper is open-source or publicly available. |
| Open Datasets | Yes | However, to facilitate reproducibility and comparison to non-private models, our experiments are conducted on a public dataset as is standard in differential privacy research. We use a large public dataset of Reddit posts, as described by Al-Rfou et al. (2016). Critically for our purposes, each post in the database is keyed by an author, so we can group the data by these keys in order to provide user-level privacy. We preprocessed the dataset to K = 763, 430 users each with 1600 tokens. Thus, we take wk = 1 for all users, so W = K. We write C = q K = q W for the expected number of users sampled per round. See B for details on the dataset and preprocessing. The Reddit dataset can be accessed through Google Big Query (Reddit Comments Dataset). |
| Dataset Splits | No | The paper mentions training on the Reddit dataset and using a 'relatively small test set'. It does not explicitly mention or specify details for a separate validation set. |
| Hardware Specification | No | The paper does not explicitly mention specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using an 'implementation of the moments accountant' and refers to TensorFlow (via a GitHub link in the references), but it does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | For these experiments, we use the Fed Avg algorithm with a fixed learning rate of 6.0, which we verified was a reasonable choice in preliminary experiments. In all Fed Avg experiments, we used a local batch size of B = 8, an unroll size of 10 tokens, and made E = 1 passes over the local dataset; thus Fed Avg processes 80 tokens per batch, processing a user s 1600 tokens in 20 batches per round. |