Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Time-Masked Transformers with Lightweight Test-Time Adaptation for Neural Speech Decoding

Authors: Ebrahim Feghhi, Shreyas Kaasyap, Nima Hadidi, Jonathan C. Kao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	First, we incorporate large amounts of time-masking during training. On average, over 50% of each trial is masked. Second, we replace the gated recurrent unit (GRU) architecture used in the baseline algorithm with a compact Transformer...Together, these contributions reduce word error rate by over 20% and effectively mitigate performance degradations across held-out days in a real-time decoding setting while substantially lowering computational costs.
Researcher Affiliation	Academia	Ebrahim Feghhi1,2 Shreyas Kaasyap2 Nima Hadidi1,2 Jonathan C. Kao1,2,3 1 Neuroscience Interdepartmental Program 2 Department of Electrical & Computer Engineering 3 Department of Computer Science University of California, Los Angeles Correspondence: EMAIL
Pseudocode	Yes	Algorithm 1: TIMEMASK(x, N, M)
Open Source Code	Yes	Code is available at the following link: https://github.com/ebrahimfeghhi/transformers_ with_dietcorp.
Open Datasets	Yes	To accelerate progress along these lines, the Brain-to-Text Benchmark 24 was released, an open-source dataset containing intracortical neural recordings while a participant with ALS attempted to speak sentences across 24 days.
Dataset Splits	Yes	The benchmark provided train, validation, and test splits. There were 8800 sentences in train, 880 sentences in validation, and 1200 sentences in test. Train and validation sentences were recorded on 24 days (collected over almost 4 months), and test sentences were recorded on 15 out of the 24 days.
Hardware Specification	Yes	The majority of our results were generated using an Ubuntu server with three Nvidia Ge Force RTX 3090 GPUs and an AMD Ryzen Threadripper 3960X 24-Core CPU with 125 Gi B of memory. For the 5-gram LM and LLM ﬁne-tuning results, we used the “g6e.16xlarge” Amazon EC2 instance with one Nvidia L40S GPU and an AMD EPYC 64-core CPU with 512 Gi B of memory.
Software Dependencies	No	The paper mentions several software components like Adam optimizer, connectionist temporal classiﬁcation (CTC) loss, AdamW optimizer, and the Unsloth package for QLoRA with Llama 3.1 8B. However, it does not consistently provide specific version numbers for these general software components or libraries.
Experiment Setup	Yes	The full set of hyperparameters is listed in Table 5. We used a unidirectional GRU for all results unless otherwise stated... The full set of Transformer hyperparameters is listed in Table 6.