Unsupervised Speech Recognition via Segmental Empirical Output Distribution Matching
Authors: Chih-Kuan Yeh, Jianshu Chen, Chengzhu Yu, Dong Yu
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on TIMIT dataset demonstrate the success of this fully unsupervised phoneme recognition system, which achieves a phone error rate (PER) of 41.6%. |
| Researcher Affiliation | Collaboration | Chih-Kuan Yeh Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213, USA cjyeh@cs.cmu.edu Jianshu Chen, Chengzhu Yu & Dong Yu Tencent AI Lab Bellevue, WA 98004, USA {jianshuchen,czyu,dyu}@tencent.com The work was done during an internship at Tencent AI Lab, Bellevue, WA. |
| Pseudocode | Yes | Algorithm 1: Training Algorithm |
| Open Source Code | No | The paper does not provide a link to open-source code for the described methodology or explicitly state that the code is being released. |
| Open Datasets | Yes | We perform experiments on the TIMIT dataset where 6300 prompted English speech sentences are recorded. The preparation of training and test sets follow the standard protocol of the TIMIT dataset. |
| Dataset Splits | Yes | In the first setting, we follow the standard TIMIT partition to divide the training data into a training and validation sets of 3696 and 400 utterances, respectively. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run the experiments (e.g., specific GPU/CPU models, memory specifications). |
| Software Dependencies | No | The paper mentions 'Kaldi speech recognition toolkit (Povey et al., 2011)' but does not provide specific version numbers for Kaldi or other key software components, which is required for reproducibility. |
| Experiment Setup | Yes | The classifier pθ(yt|xt) is modeled by a fully connected neural network with one hidden layer of 512 Re LU units. The input to the neural network is a concatenation of frames within a context window of size 11. The optimization of (3) is performed with momentum SGD with a fixed schedule of increasing batch size from 5000 to 20000. λ in (3) is chosen to be 10 5. Details of the experiment setting and other hyperparameters can be found in Appendix B. |