W-CTC: a Connectionist Temporal Classification Loss with Wild Cards

Authors: Xingyu Cai, Jiahong Yuan, Yuchen Bian, Guangxu Xun, Jiaji Huang, Kenneth Church

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluations on a number of tasks in speech and vision domains, show that the proposed W-CTC consistently outperforms the standard CTC by a large margin when label is incomplete. The effectiveness of the proposed method is further confirmed in an ablation study.
Researcher Affiliation Industry Xingyu Cai, Jiahong Yuan, Yuchen Bian, Guangxu Xun, Jiaji Huang, Kenneth Church Baidu Research, 1195 Bordeaux Dr, Sunnyvale, CA 94089, USA xingyucai@baidu.com
Pseudocode No The paper describes the algorithm steps in text and equations but does not include a formally labeled pseudocode or algorithm block.
Open Source Code Yes All the codes can be found at https://github.com/Tide Dancer/iclr22-wctc.
Open Datasets Yes We use the TIMIT (Garofolo, 1993) dataset in this experiment... Two standard collections were used for training: (a) MJSynth (MJ, 9 million images) (Jaderberg et al., 2014) and (b) Synth Text (ST, 800k images) (Gupta et al., 2016)... The dataset is PHOENIX14T (Camgoz et al., 2018)
Dataset Splits No The paper mentions training and test sets but does not explicitly detail validation dataset splits or how they were derived, beyond general statements about evaluation.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory) used for running its experiments, only general statements about backbone models.
Software Dependencies No The paper lists external code repositories and references for its implementations (e.g., huggingface/transformers, Media-Smart/vedastr, neccam/slt), but does not provide specific version numbers for the underlying software libraries or programming languages (e.g., Python, PyTorch).
Experiment Setup Yes Table 3: List of key training hyper-parameters. Batch-size Optimizer LR Steps ASR 32 Adam W 1e-4 7k (50 epochs) PR 32 Adam W 1e-4 7k (50 epochs) OCR 500 Ada Delta 1 150k CSLR 32 Adam 1e-3 Stop if no better for 800 steps