reproducibilityindex.ai

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

Authors: Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Squeezeformer achieves state-of-the-art results of 7.5%, 6.5%, and 6.0% word-error-rate (WER) on Libri Speech test-other without external language models, which are 3.1%, 1.4%, and 0.6% better than Conformer-CTC with the same number of FLOPs. Our code is open-sourced and available online [25].
Researcher Affiliation	Collaboration	1University of California, Berkeley 2ICSI 3LBNL {sehoonkim, amirgh, nicholas_lee, mangalam, malik, mahoneymw, keutzer}@berkeley.edu Albertshaw@google.com
Pseudocode	No	The paper describes the architecture and modifications using text and diagrams, but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is open-sourced and available online [25]. Our code along with the checkpoints for all of the trained models is open-sourced and available online [25].
Open Datasets	Yes	We train both Conformer-CTC and Squeezeformer on the Libri Speech-960hr [41] and Librispeech: an ASR corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 5206 5210, 2015.
Dataset Splits	Yes	Table 3: WER (%) comparison on Libri Speech dev and test datasets for Squeezeformer and other state-of-the-art CTC models for ASR including Conformer-CTC, Quartz Net [27], Citri Net [36], Transformer-CTC [31], and Efficient Conformer-CTC [4]. For comparison, we include the number of parameters, FLOPs, and throughput (Thp) on a single NVIDIA Tesla A100 GPU for a 30s input in the last three columns. The performance numbers for Conformer-CTC are based on our own reproduction to the best performance as possible and the others are the reported numbers in their papers [4, 27, 36]. With and without the grouped attention. Model dev-clean dev-other test-clean test-other Params (M) GFLOPs Thp (ex/s)
Hardware Specification	Yes	We train both Conformer-CTC and Squeezeformer on the Libri Speech-960hr [41] for 500 epochs on Google s cloud TPUs v3 with batch size 1024 for the small and medium variants and 2048 for the large variants. Thp (ex/s) on a single NVIDIA Tesla A100 GPU
Software Dependencies	No	The paper mentions using AdamW optimizer, but does not provide specific version numbers for any software, libraries, or frameworks used in the experiments (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup	Yes	We train both Conformer-CTC and Squeezeformer on the Libri Speech-960hr [41] for 500 epochs on Google s cloud TPUs v3 with batch size 1024 for the small and medium variants and 2048 for the large variants. We use Adam W [33] optimizer with weight decay 5e-4 for all models.