Understanding the Role of Self Attention for Efficient Speech Recognition

Authors: Kyuhong Shim, Jungwook Choi, Wonyong Sung

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate this idea, we implement the layer-wise attention map reuse on real GPU platforms and achieve up to 1.96 times speedup in inference and 33% savings in training time with noticeably improved ASR performance for the challenging benchmark on Libri Speech dev/test-other dataset.
Researcher Affiliation Academia Kyuhong Shim1, Jungwook Choi2, Wonyong Sung1 Department of Electrical and Computer Engineering, Seoul National University1 Department of Electrical Engineering, Hanyang University2
Pseudocode No The paper describes computational procedures and equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes We also provide the source code for the experiments in supplemental materials.
Open Datasets Yes We train and evaluate the model on the Libri Speech-960 (Panayotov et al., 2015) dataset.
Dataset Splits Yes We train and evaluate the model on the Libri Speech-960 (Panayotov et al., 2015) dataset. ... Table 2: Word error rate (%) for different attention map reuse configurations. ... dev-clean dev-other test-clean test-other
Hardware Specification Yes Inference speed is evaluated on a single RTX-Titan(24GB) GPU and training cost is measured in GPU-hours on A100(40GB) GPU.
Software Dependencies No The paper mentions software components and frameworks like Conformer, CTC, Sentence Piece, Adam W, MFA, Sync BN, Spec Augment, and SWA, but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes Please see Appendix A.1 and A.2 for the model configuration and training details. ... Table 4: Conformer-M implementation details. ... Table 5: Training details including optimizer, scheduler, augmentation and other hyper-parameters.