reproducibilityindex.ai

Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems

Authors: Menoua Keshishian, Samuel Norman-Haignere, Nima Mesgarani

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We applied our method to understand how the popular Deep Speech2 model learns to integrate across time in speech. We find that nearly all of the model units, even in recurrent layers, have a compact integration window within which stimuli substantially alter the response and outside of which stimuli have little effect. We show that training causes these integration windows to shrink at early layers and expand at higher layers, creating a hierarchy of integration windows across the network.
Researcher Affiliation	Academia	Menoua Keshishian Department of Electrical Engineering Zuckerman Mind Brain Behavior Institute Columbia University New York, NY 10027 mk4011@columbia.edu Sam V. Norman-Haignere Department of Electrical Engineering Zuckerman Mind Brain Behavior Institute Columbia University New York, NY 10027 sn2776@columbia.edu Nima Mesgarani Department of Electrical Engineering Zuckerman Mind Brain Behavior Institute Columbia University New York, NY 10027 nima@ee.columbia.edu
Pseudocode	No	The paper describes the methods and analysis steps in prose and with mathematical equations, but does not include any blocks explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	Code available at: https://github.com/naplab/Py TCI
Open Datasets	Yes	Sound segments were excerpted from the Libri Speech corpus dev-clean and test-clean sets. All models were implemented in Py Torch (Paszke et al., 2019) and trained using Py Torch Lightning (Falcon, 2019) on the training set of the Libri Speech corpus (Panayotov et al., 2015).
Dataset Splits	No	The paper mentions using 'Libri Speech test-clean set' for evaluation and 'training set of the Libri Speech corpus' for training, but does not specify a separate validation set split or its size/proportion.
Hardware Specification	Yes	Training and inference of all models were performed on NVIDIA A40 GPUs (one per training/inference) at the internal cluster at the Zuckerman Institute of Columbia University.
Software Dependencies	No	All models were implemented in Py Torch (Paszke et al., 2019) and trained using Py Torch Lightning (Falcon, 2019)... Augmentations were performed using the Sound e Xchange (So X) backend of the audio library for Py Torch (torchaudio). While PyTorch and PyTorch Lightning are cited, their specific versions are not explicitly stated in the text for replication.
Experiment Setup	Yes	We used the CTC loss (Graves et al., 2006), the Adam optimizer (learning rate: 1.5e 4, weight decay: 1e 5) (Kingma and Ba, 2014), and a batch size of 64. (trained for 20 epochs).