reproducibilityindex.ai

Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems

Authors: Yonatan Belinkov, James Glass

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments shed light on important aspects of the end-to-end model such as layer depth, model complexity, and other design choices.
Researcher Affiliation	Academia	Yonatan Belinkov and James Glass Computer Science and Artiﬁcial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139 {belinkov, glass}@mit.edu
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	The code for all of our experiments is publicly available.2 2http://github.com/boknilev/asr-repr-analysis
Open Datasets	Yes	The end-to-end models are trained on Libri Speech [34], a publicly available corpus of English read speech, containing 1,000 hours sampled at 16k Hz. For the phoneme recognition task, we use TIMIT, which comes with time segmentation of phones.
Dataset Splits	Yes	We use the ofﬁcial train/development/test split and extract frames for the frame classiﬁcation task. Table 2 summarizes statistics of the frame classiﬁcation dataset.
Hardware Specification	No	No specific hardware details (e.g., CPU/GPU models, memory) used for experiments were mentioned in the paper.
Software Dependencies	No	The paper mentions "deepspeech.torch [33]" but does not provide a specific version number. No other specific software with version numbers were listed.
Experiment Setup	Yes	We model the classiﬁer as a feed-forward neural network with one hidden layer, where the size of the hidden layer is set to 500. We train the classiﬁer with Adam [32] with the recommended parameters ( = 0.001, β1 = 0.9, β2 = 0.999, = e 8) to minimize the cross-entropy loss. We use a batch size of 16, train the model for 30 epochs, and choose the model with the best development loss for evaluation.