Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Unsupervised Speech Recognition via Segmental Empirical Output Distribution Matching
Authors: Chih-Kuan Yeh, Jianshu Chen, Chengzhu Yu, Dong Yu
ICLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on TIMIT dataset demonstrate the success of this fully unsupervised phoneme recognition system, which achieves a phone error rate (PER) of 41.6%. |
| Researcher Affiliation | Collaboration | Chih-Kuan Yeh Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213, USA EMAIL Jianshu Chen, Chengzhu Yu & Dong Yu Tencent AI Lab Bellevue, WA 98004, USA EMAIL The work was done during an internship at Tencent AI Lab, Bellevue, WA. |
| Pseudocode | Yes | Algorithm 1: Training Algorithm |
| Open Source Code | No | The paper does not provide a link to open-source code for the described methodology or explicitly state that the code is being released. |
| Open Datasets | Yes | We perform experiments on the TIMIT dataset where 6300 prompted English speech sentences are recorded. The preparation of training and test sets follow the standard protocol of the TIMIT dataset. |
| Dataset Splits | Yes | In the first setting, we follow the standard TIMIT partition to divide the training data into a training and validation sets of 3696 and 400 utterances, respectively. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run the experiments (e.g., specific GPU/CPU models, memory specifications). |
| Software Dependencies | No | The paper mentions 'Kaldi speech recognition toolkit (Povey et al., 2011)' but does not provide specific version numbers for Kaldi or other key software components, which is required for reproducibility. |
| Experiment Setup | Yes | The classifier pθ(yt|xt) is modeled by a fully connected neural network with one hidden layer of 512 Re LU units. The input to the neural network is a concatenation of frames within a context window of size 11. The optimization of (3) is performed with momentum SGD with a fixed schedule of increasing batch size from 5000 to 20000. λ in (3) is chosen to be 10 5. Details of the experiment setting and other hyperparameters can be found in Appendix B. |