reproducibilityindex.ai

A fully differentiable beam search decoder

Authors: Ronan Collobert, Awni Hannun, Gabriel Synnaeve

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply DBD to the task of automatic speech recognition and show competitive performance on the Wall Street Journal (WSJ) corpus (Paul & Baker, 1992). 6. Experiments We performed experiments with WSJ (about 81h of transcribed audio data).
Researcher Affiliation	Industry	1Facebook AI Research. Correspondence to: Ronan Collobert <locronan@fb.com>, Awni Hannun <awni@fb.com>, Gabriel Synnaeve <gab@fb.com>.
Pseudocode	No	The paper describes algorithms verbally and mathematically, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The paper does not provide an explicit statement about releasing the source code or a link to a code repository.
Open Datasets	Yes	We apply DBD to the task of automatic speech recognition and show competitive performance on the Wall Street Journal (WSJ) corpus (Paul & Baker, 1992).
Dataset Splits	Yes	We consider the standard subsets si284, nov93dev and nov92 for training, validation and test, respectively.
Hardware Specification	No	Both the neural network acoustic model and the ASG criterion run on a single GPU. The DBD criterion is CPU-only. No specific GPU or CPU models are mentioned.
Software Dependencies	No	The paper mentions Ken LM but does not specify a version number for it or any other key software dependencies.
Experiment Setup	Yes	We use log-mel ﬁlterbanks as features fed to the acoustic model, with 40 ﬁlters of size 25ms, strided by 10ms. ... We consider an end-to-end setup, where the token set D (see Section 2) includes English letters (a-z), the apostrophe and the period character, as well as a space character, leading to 29 different tokens. ... All the models are trained with stochastic gradient descent (SGD), enhanced with gradient clipping (Pascanu et al., 2013) and weight normalization (Salimans & Kingma, 2016). We use batch training (16 utterances at once), sorting inputs by length for efﬁciency. ... Our best Conv Net acoustic model has 10M parameters and an overall receptive ﬁeld of 1350ms. ... In most experiments, we use a beam size of 500, as larger beam sizes led to marginal WER improvements.