Don’t Be So Sure! Boosting ASR Decoding via Confidence Relaxation
Authors: Tomer Wullach, Shlomo E. Chazan
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our approach by conducting an empirical study on varying amounts of labeled resources and different model sizes, showing consistent improvements in particular when applied to low-resource scenarios. |
| Researcher Affiliation | Industry | Origin AI, Ramat-Gan, Israel tomerw, shlomi@originai.co |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions external platforms and toolkits used (Hugging Face, pyctcdecode, Ax toolkit) and provides their URLs, but it does not state that the authors' own implementation code for their proposed methodology is open-source or provide a link to it. |
| Open Datasets | Yes | We conduct our experiments using pre-trained models downloaded from the Hugging Face platform (https://huggingface.co/models), which were fine-tuned using 10m, 1h, 10h, 100h, 360h and 960h train subsets of Librispeech. The model size varies between 95M and 964M parameters, and pre-training was performed using either 960 hours of Librispeech ( LS-960 ) (Panayotov et al. 2015) or 60k hours of Libri-Light ( LL-60K ) (Kahn et al. 2020). |
| Dataset Splits | Yes | We tune the aggregation hyperparameters, namely, the number of aggregated layers (M) and aggregation tradeoff coefficient (β) using grid search employed on the dev-clean set. We evaluate both setups using the Librispeech test-clean , test-other, dev-clean , dev-other splits, and report the results using the Word Error Rate(WER) and Charater Error Rate(CER) metrics. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions software tools like Hugging Face, pyctcdecode, and Ax toolkit, but it does not specify version numbers for these or other relevant software dependencies. |
| Experiment Setup | Yes | The beam search decoder parameters, α1 anf α2 (see eq. 3), were searched and optimized using the Ax toolkit (https://github.com/facebook/Ax). We tune the aggregation hyperparameters, namely, the number of aggregated layers (M) and aggregation tradeoff coefficient (β) using grid search employed on the dev-clean set. Table 1 summarizes the aggregation hyperparameters for each of the experimented models. |