Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems
Authors: Yonatan Belinkov, James Glass
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments shed light on important aspects of the end-to-end model such as layer depth, model complexity, and other design choices. |
| Researcher Affiliation | Academia | Yonatan Belinkov and James Glass Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA 02139 {belinkov, glass}@mit.edu |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | The code for all of our experiments is publicly available.2 2http://github.com/boknilev/asr-repr-analysis |
| Open Datasets | Yes | The end-to-end models are trained on Libri Speech [34], a publicly available corpus of English read speech, containing 1,000 hours sampled at 16k Hz. For the phoneme recognition task, we use TIMIT, which comes with time segmentation of phones. |
| Dataset Splits | Yes | We use the official train/development/test split and extract frames for the frame classification task. Table 2 summarizes statistics of the frame classification dataset. |
| Hardware Specification | No | No specific hardware details (e.g., CPU/GPU models, memory) used for experiments were mentioned in the paper. |
| Software Dependencies | No | The paper mentions "deepspeech.torch [33]" but does not provide a specific version number. No other specific software with version numbers were listed. |
| Experiment Setup | Yes | We model the classifier as a feed-forward neural network with one hidden layer, where the size of the hidden layer is set to 500. We train the classifier with Adam [32] with the recommended parameters ( = 0.001, β1 = 0.9, β2 = 0.999, = e 8) to minimize the cross-entropy loss. We use a batch size of 16, train the model for 30 epochs, and choose the model with the best development loss for evaluation. |