Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
E-BATS: Efficient Backpropagation-Free Test-Time Adaptation for Speech Foundation Models
Authors: Jiaheng Dong, Hong Jia, Soumyajit Chatterjee, Abhirup Ghosh, James Bailey, Ting Dang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments conducted on four noisy speech datasets spanning sixteen acoustic conditions demonstrate consistent improvements, with 4.1% 13.5% accuracy gains over backpropagation-free baselines and 2.0 6.4 GPU memory savings compared to backpropagation-based methods. |
| Researcher Affiliation | Collaboration | Jiaheng Dong The University of Melbourne EMAIL Hong Jia University of Auckland EMAIL Soumyajit Chatterjee Nokia Bell Labs, UK EMAIL Abhirup Ghosh University of Birmingham EMAIL James Bailey The University of Melbourne EMAIL Ting Dang The University of Melbourne EMAIL |
| Pseudocode | Yes | D Algorithms The algorithm for LPA per utterance and for T-EMA is shown in Algorithm 1 and Algorithm 2 respectively. |
| Open Source Code | Yes | Code is available at: https://github.com/Jiaheng Dong/E-BATS |
| Open Datasets | Yes | We evaluate the proposed method on four datasets across sixteen acoustic conditions... We introduce synthetic noise to the Libri Speech test-other split [38]... We use the CHi ME-3 dataset [39]... Common Voice (CV) [40]... TEDLIUM-v2 (TED) [41]... |
| Dataset Splits | Yes | The test sets encompass three categories of acoustic variability... We introduce synthetic noise to the Libri Speech test-other split [38]... We utilize the official simulated and real enhanced evaluation sets from CHi ME3 [39]... The test set from the en-June-22nd-2020 release was used... We use the official test set for experiments... |
| Hardware Specification | Yes | All experiments are conducted on a single NVIDIA A100 GPU. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in the implementation. |
| Experiment Setup | Yes | For E-BATS, we set the CMA-ES population size J = 50. The loss function coefficients are α = 1.0 and β = 2.0. We use Hmin = 0.0, Hmax = 5.0 in calculating the confidence-weighted coefficient c with cmax = 2.0 optimized over {1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0}. Evaluation is performed using two commonly used SFMs, Wav2Vec2For CTC-Base [42] and Hu BERTFor CTCLarge [43]; both models are fine-tuned on Libri Speech and then are adapted in our experiments. For T-EMA, we select γ = 0.9 for Wav2Vec2 and γ = 0.8 for Hu BERT after tuning over {0.7, 0.8, 0.9, 0.95, 0.99}. We use Word Error Rate (WER) [44] as the evaluation metric... All TTA baselines are configured for per-utterance adaptation with batch size of 1. |