Star Temporal Classification: Sequence Modeling with Partially Labeled Data
Authors: Vineel Pratap, Awni Hannun, Gabriel Synnaeve, Ronan Collobert
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive experiments on automatic speech recognition. These experiments show that STC can close the performance gap with supervised baseline to about 1% WER when up to 70% of the labels are missing. We also perform experiments in handwriting recognition to show that our method easily applies to other sequence classification tasks. |
| Researcher Affiliation | Industry | Vineel Pratap Meta AI Awni Hannun Zoom AI Gabriel Synnaeve Meta AI Ronan Collobert Meta AI. Currently at Apple. |
| Pseudocode | No | The paper includes figures illustrating WFST compositions and the STC training pipeline, but it does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions that the model architectures and STC loss are implemented with the ASR application of the flashlight machine-learning framework and the C++ API of GTN, and refers to K2, providing links to these third-party frameworks. However, it does not explicitly state that *their specific implementation code* for STC or the experiments is open-source or provide a direct link to it. |
| Open Datasets | Yes | We use Libri Speech [33] dataset, containing 960 hours of training audio with paired transcriptions for our speech recognition experiments. ... We test our approach on IAM Handwriting database [26], which is a widely used benchmark for handwriting recognition. |
| Dataset Splits | Yes | The standard Libri Speech validation sets (dev-clean and dev-other) are used to tune all hyperparameters, as well as to select the best models. ... We use Aachen data splits4 to divide the dataset into three subsets: 6,482 lines for training, 976 lines for validation and 2,915 lines for testing. |
| Hardware Specification | Yes | The experiments on Libri Speech, IAM are run on 32, 8 Nvidia 32GB V100 GPUs and uses C++, Python APIs of GTN respectively. |
| Software Dependencies | No | The paper mentions the use of 'the ASR application [35] of the flashlight3 machine-learning framework' and 'the C++ API of GTN' and also 'GTN[15], K2[19]' but does not specify version numbers for these software dependencies. |
| Experiment Setup | Yes | We quantify this using a parameter p_drop [0, 1], which denotes the probability of dropping a word from the transcript label. ... We use a parameter called token insertion penalty, λ to add a penalty when using lot of new tokens. ... pt = pmax + (p0 pmax)exp( t/τ) λt = ln(pt) (4) where p0, pmax, τ are hyperparameters... Additional implementation and training details (models, tokens, optimization, other hyperparameters and settings) for the ASR and HWR experiments are in the Appendix B. |