Optimal Completion Distillation for Sequence Learning

Authors: Sara Sabour, William Chan, Mohammad Norouzi

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental OCD achieves the state-of-the-art performance on end-to-end speech recognition, on both Wall Street Journal and Librispeech datasets, achieving 9.3% and 4.5% word error rates, respectively.
Researcher Affiliation Industry Sara Sabour, William Chan, Mohammad Norouzi {sasabour, williamchan, mnorouzi}@google.com Google Brain
Pseudocode Yes Procedure 1 Edit Distance Q op returns Q-values of the tokens at each time step based on the minimum edit distance between a reference sequence r and a hypothesis sequence h of length t.
Open Source Code No We are in the process of releasing the code for OCD.
Open Datasets Yes We conduct our experiments on speech recogntion on the Wall Street Journal (WSJ) (Paul and Baker, 1992) and Librispeech (Panayotov et al., 2015) benchmarks.
Dataset Splits Yes We use the standard configuration of si284 for training, dev93 for validation and report both test Character Error Rate (CER) and Word Error Rate (WER) on eval92. [...] For the Librispeech dataset, we train on the full training set (960h audio data) and validate our results on the dev-other set.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'Tensor Flow (Abadi et al., 2016)' but does not specify a version number for the software itself, nor for any other libraries or dependencies.
Experiment Setup Yes Our encoder uses 2-layers of convolutions with 3x3 filters, stride 2x2 and 32 channels, followed by a convolutional LSTM with 1D-convolution of filter width 3, followed by 3 LSTM layers with 256 cell size. [...] train our models for 300 epochs of batch size 8 with 8 async workers. We separately tune the learning rate for our baseline and OCD model, 0.0007 for OCD vs 0.001 for baseline.