reproducibilityindex.ai

Unsupervised Speech Recognition

Authors: Alexei Baevski, Wei-Ning Hsu, Alexis CONNEAU, Michael Auli

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate the viability of the framework for a variety of settings and languages. wav2vec-U improves the phone error rate (PER) on the small-scale TIMIT benchmark from 26.1 to 11.3 compared to the next best known unsupervised approach. To get a better sense of the performance compared to the best supervised methods, we measure performance on the larger Librispeech benchmark where our method achieves word error rate (WER) 5.9 on test-other.
Researcher Affiliation	Industry	Alexei Baevski Wei-Ning Hsu Alexis Conneau Michael Auli Facebook AI Google AI
Pseudocode	No	The paper does not contain any structured pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/pytorch/fairseq/tree/ master/examples/wav2vec/unsupervised
Open Datasets	Yes	Librispeech is a standard benchmark in the speech recognition community which provides about 960 hours of transcribed read audiobooks. We use the language modeling data of Librispeech as unlabeled text data for unsupervised training. We also consider self-training over three iterations by ﬁrst training an HMM on the labels generated by the GANm then ﬁne-tuning the original wav2vec 2.0 model on the labels of the HMM for Librispeech followed by then ﬁne-tuning on Libri-Light; Appendix F investigates alternatives.
Dataset Splits	Yes	Librispeech provides clean dev/test sets which are less challenging than the other sets. We measure performance on the standard Kaldi dev and test sets (core-dev/core-test) as well as a slightly larger version of the test set (all-test) to be able to compare to Liu et al. [2018] and Chen et al. [2019].
Hardware Specification	No	The paper mentions using GPUs for fast clustering with the FAISS library but does not specify any particular GPU models, CPU models, or other detailed hardware specifications used for running the experiments or training the models.
Software Dependencies	No	The paper mentions software like fairseq, PyTorch, FAISS, and Kaldi, but does not specify version numbers for these or any other key software dependencies.
Experiment Setup	No	The paper describes the model architecture and objective function, including the penalties used (gradient penalty, segment smoothness penalty, phoneme diversity loss), but does not provide specific numerical values for hyperparameters such as learning rate, batch size, number of epochs, or the weights (λ, γ, η) for the loss components.