Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Time-Masked Transformers with Lightweight Test-Time Adaptation for Neural Speech Decoding
Authors: Ebrahim Feghhi, Shreyas Kaasyap, Nima Hadidi, Jonathan C. Kao
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | First, we incorporate large amounts of time-masking during training. On average, over 50% of each trial is masked. Second, we replace the gated recurrent unit (GRU) architecture used in the baseline algorithm with a compact Transformer...Together, these contributions reduce word error rate by over 20% and effectively mitigate performance degradations across held-out days in a real-time decoding setting while substantially lowering computational costs. |
| Researcher Affiliation | Academia | Ebrahim Feghhi1,2 Shreyas Kaasyap2 Nima Hadidi1,2 Jonathan C. Kao1,2,3 1 Neuroscience Interdepartmental Program 2 Department of Electrical & Computer Engineering 3 Department of Computer Science University of California, Los Angeles Correspondence: EMAIL |
| Pseudocode | Yes | Algorithm 1: TIMEMASK(x, N, M) |
| Open Source Code | Yes | Code is available at the following link: https://github.com/ebrahimfeghhi/transformers_ with_dietcorp. |
| Open Datasets | Yes | To accelerate progress along these lines, the Brain-to-Text Benchmark 24 was released, an open-source dataset containing intracortical neural recordings while a participant with ALS attempted to speak sentences across 24 days. |
| Dataset Splits | Yes | The benchmark provided train, validation, and test splits. There were 8800 sentences in train, 880 sentences in validation, and 1200 sentences in test. Train and validation sentences were recorded on 24 days (collected over almost 4 months), and test sentences were recorded on 15 out of the 24 days. |
| Hardware Specification | Yes | The majority of our results were generated using an Ubuntu server with three Nvidia Ge Force RTX 3090 GPUs and an AMD Ryzen Threadripper 3960X 24-Core CPU with 125 Gi B of memory. For the 5-gram LM and LLM fine-tuning results, we used the “g6e.16xlarge” Amazon EC2 instance with one Nvidia L40S GPU and an AMD EPYC 64-core CPU with 512 Gi B of memory. |
| Software Dependencies | No | The paper mentions several software components like Adam optimizer, connectionist temporal classification (CTC) loss, AdamW optimizer, and the Unsloth package for QLoRA with Llama 3.1 8B. However, it does not consistently provide specific version numbers for these general software components or libraries. |
| Experiment Setup | Yes | The full set of hyperparameters is listed in Table 5. We used a unidirectional GRU for all results unless otherwise stated... The full set of Transformer hyperparameters is listed in Table 6. |