Revealing and Protecting Labels in Distributed Training

Authors: Trung Dang, Om Thakkar, Swaroop Ramaswamy, Rajiv Mathews, Peter Chin, Françoise Beaufays

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate our method on 1) an image classification task and 2) an automatic speech recognition (ASR) task. Furthermore, we show that existing reconstruction techniques improve their efficacy when used in conjunction with our method.
Researcher Affiliation Collaboration Trung Dang Boston University trungvd@bu.edu Om Thakkar Google omthkkr@google.com Swaroop Ramaswamy Google swaroopram@google.com Rajiv Mathews Google mathews@google.com Peter Chin Boston University spchin@bu.edu Françoise Beaufays Google fsb@google.com
Pseudocode Yes Algorithm 1 Reveal the set of labels from a weight update of the projection layer
Open Source Code Yes Our code is published at https://github.com/googleinterns/learning-bag-of-words.
Open Datasets Yes We randomly sample 100 batches of size N = 10, 50, 100 from the validation set of Image Net. We use the Image Net pre-trained models provided by Tensorflow (Keras) [25] library. We demonstrate Bo W prediction for utterances from Libri Speech [28].
Dataset Splits No We randomly sample 100 batches of size N = 10, 50, 100 from the validation set of Image Net. While a validation set is mentioned, the paper does not specify the train/validation/test splits (e.g., percentages or counts for each split) for the entire dataset used in the experiments.
Hardware Specification No The paper mentions 'a CPU' and 'a single GPU' or '8 GPUs' for different tasks, but does not provide specific model numbers or detailed hardware specifications.
Software Dependencies No The paper mentions 'Tensorflow (Keras) [25] library' and 'lingvo [27] using Tensorflow [25]' but does not provide specific version numbers for these software components.
Experiment Setup Yes We randomly sample 100 batches of size N = 10, 50, 100 from the validation set of Image Net. We run the reconstruction with learning rate 0.05, decaying by a factor of 2 after every 4,000 steps. The reconstruction stops after 2,000 steps that the transcript remains unchanged. The training is performed on centralized training data, with batch size 96 across 8 GPUs.