reproducibilityindex.ai

Latent Sequence Decompositions

Authors: William Chan, Yu Zhang, Quoc Le, Navdeep Jaitly

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experiment with the Wall Street Journal speech recognition task. Our LSD model achieves 12.9% WER compared to a character baseline of 14.8% WER.
Researcher Affiliation	Collaboration	William Chan Carnegie Mellon University williamchan@cmu.edu Yu Zhang Massachusetts Institute of Technology yzhang87@mit.edu Quoc V. Le, Navdeep Jaitly Google Brain {qvl,ndjaitly}@google.com
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	The paper does not provide any concrete access information (e.g., specific repository link, explicit statement of code release, or code in supplementary materials) for the source code of the methodology described.
Open Datasets	Yes	We experimented with the Wall Street Journal (WSJ) ASR task. We used the standard conﬁguration of train si284 dataset for training, dev93 for validation and eval92 for test evaluation.
Dataset Splits	Yes	We used the standard conﬁguration of train si284 dataset for training, dev93 for validation and eval92 for test evaluation.
Hardware Specification	No	We used 8 GPU workers for asynchronous SGD under the Tensor Flow framework (Abadi et al., 2015). (No specific GPU models or other hardware details provided).
Software Dependencies	No	The paper mentions using 'Tensor Flow framework' and features generated by 'Kaldi', but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Our input features were 80 dimensional ﬁlterbanks computed every 10ms with delta and delta-delta acceleration normalized with per speaker mean and variance as generated by Kaldi (Povey et al., 2011). The Encode RNN function is a 3 layer BLSTM with 256 LSTM units per-direction (or 512 total) and 4 = 22 time factor reduction. The Decode RNN is a 1 layer LSTM with 256 LSTM units. All the weight matrices were initialized with a uniform distribution U( 0.075, 0.075) and bias vectors to 0. Gradient norm clipping of 1 was used, gaussian weight noise N(0, 0.075) and L2 weight decay 1e 5 (Graves, 2011). We used ADAM with the default hyperparameters described in (Kingma & Ba, 2015), however we decayed the learning rate from 1e 3 to 1e 4.