Towards Zero-Shot Learning for Automatic Phonemic Transcription
Authors: Xinjian Li, Siddharth Dalmia, David Mortensen, Juncheng Li, Alan Black, Florian Metze8261-8268
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model by training it using 13 languages and testing it using 7 unseen languages. We find that it achieves 7.7% better phoneme error rate on average over a standard multilingual model. |
| Researcher Affiliation | Academia | Xinjian Li, Siddharth Dalmia, David R. Mortensen, Juncheng Li, Alan W Black, Florian Metze Language Technologies Institute, School of Computer Science Carnegie Mellon University {xinjianl, sdalmia, dmortens, junchenl, awb, fmetze}@cs.cmu.edu |
| Pseudocode | Yes | Algorithm 1: A simple algorithm to assign attributes to phonemes |
| Open Source Code | No | The paper does not provide a specific link or explicit statement about the release of its source code. |
| Open Datasets | Yes | We prepare two datasets for this experiment. The training set consists of 17 corpora from 13 languages, and the test set is composed of corpora from 7 different languages. ... Details regarding each corpus and each language are provided in Table 1. |
| Dataset Splits | Yes | We note that 5 percent of the entire corpus was used as the validation set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments. |
| Software Dependencies | No | The paper mentions using the "EESEN framework" and "Epitran" but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | The input feature is 40 dimension high-resolution MFCCs, the encoder is a 5 layer Bidirectional LSTM model, each layer having 320 cells. ... We train the acoustic model with stochastic gradient descent, using a learning rate of 0.005. |