Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

E-BATS: Efficient Backpropagation-Free Test-Time Adaptation for Speech Foundation Models

Authors: Jiaheng Dong, Hong Jia, Soumyajit Chatterjee, Abhirup Ghosh, James Bailey, Ting Dang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments conducted on four noisy speech datasets spanning sixteen acoustic conditions demonstrate consistent improvements, with 4.1% 13.5% accuracy gains over backpropagation-free baselines and 2.0 6.4 GPU memory savings compared to backpropagation-based methods.
Researcher Affiliation	Collaboration	Jiaheng Dong The University of Melbourne EMAIL Hong Jia University of Auckland EMAIL Soumyajit Chatterjee Nokia Bell Labs, UK EMAIL Abhirup Ghosh University of Birmingham EMAIL James Bailey The University of Melbourne EMAIL Ting Dang The University of Melbourne EMAIL
Pseudocode	Yes	D Algorithms The algorithm for LPA per utterance and for T-EMA is shown in Algorithm 1 and Algorithm 2 respectively.
Open Source Code	Yes	Code is available at: https://github.com/Jiaheng Dong/E-BATS
Open Datasets	Yes	We evaluate the proposed method on four datasets across sixteen acoustic conditions... We introduce synthetic noise to the Libri Speech test-other split [38]... We use the CHi ME-3 dataset [39]... Common Voice (CV) [40]... TEDLIUM-v2 (TED) [41]...
Dataset Splits	Yes	The test sets encompass three categories of acoustic variability... We introduce synthetic noise to the Libri Speech test-other split [38]... We utilize the official simulated and real enhanced evaluation sets from CHi ME3 [39]... The test set from the en-June-22nd-2020 release was used... We use the official test set for experiments...
Hardware Specification	Yes	All experiments are conducted on a single NVIDIA A100 GPU.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies or libraries used in the implementation.
Experiment Setup	Yes	For E-BATS, we set the CMA-ES population size J = 50. The loss function coefficients are α = 1.0 and β = 2.0. We use Hmin = 0.0, Hmax = 5.0 in calculating the confidence-weighted coefficient c with cmax = 2.0 optimized over {1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0}. Evaluation is performed using two commonly used SFMs, Wav2Vec2For CTC-Base [42] and Hu BERTFor CTCLarge [43]; both models are fine-tuned on Libri Speech and then are adapted in our experiments. For T-EMA, we select γ = 0.9 for Wav2Vec2 and γ = 0.8 for Hu BERT after tuning over {0.7, 0.8, 0.9, 0.95, 0.99}. We use Word Error Rate (WER) [44] as the evaluation metric... All TTA baselines are configured for per-utterance adaptation with batch size of 1.