An Online Sequence-to-Sequence Model Using Partial Conditioning
Authors: Navdeep Jaitly, Quoc V. Le, Oriol Vinyals, Ilya Sutskever, David Sussillo, Samy Bengio
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that the Neural Transducer works well in settings where it is required to produce output predictions as data come in. We also find that the Neural Transducer performs well for long sequences even when attention mechanisms are not used. ... On the TIMIT phoneme recognition task, a Neural Transducer (with 3 layered unidirectional LSTM encoder and 3 layered unidirectional LSTM transducer) can achieve an accuracy of 20.8% phoneme error rate (PER) which is close to state-of-the-art for unidirectional models. |
| Researcher Affiliation | Industry | Navdeep Jaitly Google Brain ndjaitly@google.com David Sussillo Google Brain sussillo@google.com Quoc V. Le Google Brain qvl@google.com Oriol Vinyals Google Deep Mind vinyals@google.com Ilya Sutskever Open AI ilyasu@openai.com Samy Bengio Google Brain bengio@google.com |
| Pseudocode | No | The paper describes algorithms in text (e.g., dynamic programming for alignment, beam search for inference), but does not present structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., repository link, explicit statement of code release, or mention of code in supplementary materials) for the described methodology. |
| Open Datasets | Yes | We used TIMIT, a standard benchmark for speech recognition, for our larger experiments. |
| Dataset Splits | Yes | We used TIMIT, a standard benchmark for speech recognition, for our larger experiments. ... Note the TIMIT provides a validation set, called the dev set. We use these terms interchangeably. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using the 'Kaldi toolkit' to generate alignments but does not specify its version number or any other software dependencies with version numbers. |
| Experiment Setup | Yes | We used stochastic gradient descent with momentum with a batch size of one utterance per training step. An initial learning rate of 0.05, and momentum of 0.9 was used. The learning rate was reduced by a factor of 0.5 every time the average log prob over the validation set decreased 7. The decrease was applied for a maximum of 4 times. The models were trained for 50 epochs and the parameters from the epochs with the best dev set log prob were used for decoding. |