Star Temporal Classification: Sequence Modeling with Partially Labeled Data

Authors: Vineel Pratap, Awni Hannun, Gabriel Synnaeve, Ronan Collobert

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive experiments on automatic speech recognition. These experiments show that STC can close the performance gap with supervised baseline to about 1% WER when up to 70% of the labels are missing. We also perform experiments in handwriting recognition to show that our method easily applies to other sequence classification tasks.
Researcher Affiliation Industry Vineel Pratap Meta AI Awni Hannun Zoom AI Gabriel Synnaeve Meta AI Ronan Collobert Meta AI. Currently at Apple.
Pseudocode No The paper includes figures illustrating WFST compositions and the STC training pipeline, but it does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions that the model architectures and STC loss are implemented with the ASR application of the flashlight machine-learning framework and the C++ API of GTN, and refers to K2, providing links to these third-party frameworks. However, it does not explicitly state that *their specific implementation code* for STC or the experiments is open-source or provide a direct link to it.
Open Datasets Yes We use Libri Speech [33] dataset, containing 960 hours of training audio with paired transcriptions for our speech recognition experiments. ... We test our approach on IAM Handwriting database [26], which is a widely used benchmark for handwriting recognition.
Dataset Splits Yes The standard Libri Speech validation sets (dev-clean and dev-other) are used to tune all hyperparameters, as well as to select the best models. ... We use Aachen data splits4 to divide the dataset into three subsets: 6,482 lines for training, 976 lines for validation and 2,915 lines for testing.
Hardware Specification Yes The experiments on Libri Speech, IAM are run on 32, 8 Nvidia 32GB V100 GPUs and uses C++, Python APIs of GTN respectively.
Software Dependencies No The paper mentions the use of 'the ASR application [35] of the flashlight3 machine-learning framework' and 'the C++ API of GTN' and also 'GTN[15], K2[19]' but does not specify version numbers for these software dependencies.
Experiment Setup Yes We quantify this using a parameter p_drop [0, 1], which denotes the probability of dropping a word from the transcript label. ... We use a parameter called token insertion penalty, λ to add a penalty when using lot of new tokens. ... pt = pmax + (p0 pmax)exp( t/τ) λt = ln(pt) (4) where p0, pmax, τ are hyperparameters... Additional implementation and training details (models, tokens, optimization, other hyperparameters and settings) for the ASR and HWR experiments are in the Appendix B.