An Online Sequence-to-Sequence Model Using Partial Conditioning

Authors: Navdeep Jaitly, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, David Sussillo, Samy Bengio

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that the Neural Transducer works well in settings where it is required to produce output predictions as data come in. We also find that the Neural Transducer performs well for long sequences even when attention mechanisms are not used. ... On the TIMIT phoneme recognition task, a Neural Transducer (with 3 layered unidirectional LSTM encoder and 3 layered unidirectional LSTM transducer) can achieve an accuracy of 20.8% phoneme error rate (PER) which is close to state-of-the-art for unidirectional models.
Researcher Affiliation Industry Navdeep Jaitly Google Brain ndjaitly@google.com David Sussillo Google Brain sussillo@google.com Quoc V. Le Google Brain qvl@google.com Oriol Vinyals Google Deep Mind vinyals@google.com Ilya Sutskever Open AI ilyasu@openai.com Samy Bengio Google Brain bengio@google.com
Pseudocode No The paper describes algorithms in text (e.g., dynamic programming for alignment, beam search for inference), but does not present structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release, or mention of code in supplementary materials) for the described methodology.
Open Datasets Yes We used TIMIT, a standard benchmark for speech recognition, for our larger experiments.
Dataset Splits Yes We used TIMIT, a standard benchmark for speech recognition, for our larger experiments. ... Note the TIMIT provides a validation set, called the dev set. We use these terms interchangeably.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using the 'Kaldi toolkit' to generate alignments but does not specify its version number or any other software dependencies with version numbers.
Experiment Setup Yes We used stochastic gradient descent with momentum with a batch size of one utterance per training step. An initial learning rate of 0.05, and momentum of 0.9 was used. The learning rate was reduced by a factor of 0.5 every time the average log prob over the validation set decreased 7. The decrease was applied for a maximum of 4 times. The models were trained for 50 epochs and the parameters from the epochs with the best dev set log prob were used for decoding.