Towards End-To-End Speech Recognition with Recurrent Neural Networks

Authors: Alex Graves, Navdeep Jaitly

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the Wall Street Journal speech corpus demonstrate that the system is able to recognise words to reasonable accuracy, even in the absence of a language model or dictionary, and that when combined with a language model it performs comparably to a state-of-the-art pipeline. The experiments were carried out on the Wall Street Journal (WSJ) corpus (available as LDC corpus LDC93S6B and LDC94S13B).
Researcher Affiliation Collaboration Alex Graves GRAVES@CS.TORONTO.EDU Google Deep Mind, London, United Kingdom Navdeep Jaitly NDJAITLY@CS.TORONTO.EDU Department of Computer Science, University of Toronto, Canada
Pseudocode Yes Algorithm 1 CTC Beam Search
Open Source Code No The paper does not provide any explicit statements about releasing source code for the described methodology, nor does it include links to a code repository.
Open Datasets Yes The experiments were carried out on the Wall Street Journal (WSJ) corpus (available as LDC corpus LDC93S6B and LDC94S13B).
Dataset Splits Yes The RNN was trained on both the 14 hour subset train-si84 and the full 81 hour set, with the test-dev93 development set used for validation.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, or cloud instance types used for running experiments.
Software Dependencies No The paper mentions 'matplotlib python toolkit' and 'Kaldi recipe s5' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes The network had five levels of bidirectional LSTM hidden layers, with 500 cells in each layer, giving a total of 26.5M weights. It was trained using stochastic gradient descent with one weight update per utterance, a learning rate of 10 4 and a momentum of 0.9. The DNN was trained with stochastic gradient descent, starting with a learning rate of 0.1, and momentum of 0.9.