Revealing and Protecting Labels in Distributed Training
Authors: Trung Dang, Om Thakkar, Swaroop Ramaswamy, Rajiv Mathews, Peter Chin, Françoise Beaufays
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate our method on 1) an image classification task and 2) an automatic speech recognition (ASR) task. Furthermore, we show that existing reconstruction techniques improve their efficacy when used in conjunction with our method. |
| Researcher Affiliation | Collaboration | Trung Dang Boston University trungvd@bu.edu Om Thakkar Google omthkkr@google.com Swaroop Ramaswamy Google swaroopram@google.com Rajiv Mathews Google mathews@google.com Peter Chin Boston University spchin@bu.edu Françoise Beaufays Google fsb@google.com |
| Pseudocode | Yes | Algorithm 1 Reveal the set of labels from a weight update of the projection layer |
| Open Source Code | Yes | Our code is published at https://github.com/googleinterns/learning-bag-of-words. |
| Open Datasets | Yes | We randomly sample 100 batches of size N = 10, 50, 100 from the validation set of Image Net. We use the Image Net pre-trained models provided by Tensorflow (Keras) [25] library. We demonstrate Bo W prediction for utterances from Libri Speech [28]. |
| Dataset Splits | No | We randomly sample 100 batches of size N = 10, 50, 100 from the validation set of Image Net. While a validation set is mentioned, the paper does not specify the train/validation/test splits (e.g., percentages or counts for each split) for the entire dataset used in the experiments. |
| Hardware Specification | No | The paper mentions 'a CPU' and 'a single GPU' or '8 GPUs' for different tasks, but does not provide specific model numbers or detailed hardware specifications. |
| Software Dependencies | No | The paper mentions 'Tensorflow (Keras) [25] library' and 'lingvo [27] using Tensorflow [25]' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We randomly sample 100 batches of size N = 10, 50, 100 from the validation set of Image Net. We run the reconstruction with learning rate 0.05, decaying by a factor of 2 after every 4,000 steps. The reconstruction stops after 2,000 steps that the transcript remains unchanged. The training is performed on centralized training data, with batch size 96 across 8 GPUs. |