Sequence to Sequence Training of CTC-RNNs with Partial Windowing
Authors: Kyuyeon Hwang, Wonyong Sung
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach achieves 20.7% phoneme error rate (PER) on the very long input sequence that is generated by concatenating all 192 utterances in the TIMIT core test set. In the end-to-end speech recognition task on the Wall Street Journal corpus, a network can be trained with only 64 times of unrolling with little performance loss. |
| Researcher Affiliation | Academia | Kyuyeon Hwang KYUYEON.HWANG@GMAIL.COM Wonyong Sung WYSUNG@SNU.AC.KR Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul, 08826 Korea |
| Pseudocode | Yes | Algorithm 1 Online CTC training with BPTT(h; h ) for a single sequence; Algorithm 2 CTC(h; h ) at the iteration n |
| Open Source Code | No | The paper does not provide any statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | The experiments are performed on the Wall Street Journal (WSJ) (Paul & Baker, 1992) corpus. Further experiments are performed on TIMIT (Garofolo et al., 1993) in the supplementary material |
| Dataset Splits | Yes | The WSJ Nov 93 20K development set and the WSJ Nov 92 20K evaluation set are used as the development (validation) set and the test (evaluation) set, respectively. |
| Hardware Specification | Yes | The training speed is measured on the system equipped with NVIDIA Ge Force Titan X GPU and Intel Xeon E5-2620 CPU. |
| Software Dependencies | No | The paper mentions tools and algorithms like HTK, ADADELTA, and Nesterov momentum, but does not specify versions for core software libraries or frameworks used for implementation. |
| Experiment Setup | Yes | The acoustic RNN has 3 unidirectional LSTM layers, where each layer has 768 LSTM cells and. The training starts from the learning rate of 10 5 and finishes when the learning rate becomes less than 10 7. For the online update of the RNN parameters, the stochastic gradient descent (SGD) method is employed and accelerated by the Nesterov momentum of 0.9. The RNN LM is integrated with a beam width of 512, a beam depth of 50, an LM weight of 2.0, and an insertion bonus of 1.5. |