Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Brain-tuning Improves Generalizability and Efficiency of Brain Alignment in Speech Models

Authors: Omer Moussa, Mariya Toneva

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments demonstrate the multiple benefits of brain-tuning. First, our method reduces the amount of f MRI data required to achieve reliable brain alignment for new participants by a factor of five, significantly lowering the required f MRI data for a robust estimate of brain alignment. Second, brain-tuning yields up to a 50% increase in the overall brain alignment, and 3) strong generalization to new unseen datasets. Furthermore, this multi-participant brain-tuning additionally improves downstream performance on semantic tasks, suggesting that training using brain data from multiple participants leads to more generalizable semantic representations. Taken together, these findings demonstrate a bidirectional benefit between neuroscience and AI, helping bridge the gap between the two fields.
Researcher Affiliation	Academia	Omer Moussa Max Planck Institute for Software Systems Saarbrücken, Germany EMAIL Mariya Toneva Max Planck Institute for Software Systems Saarbrücken, Germany EMAIL
Pseudocode	No	The paper describes the brain-tuning approach and training details in Section 3.3, but it does so using descriptive text and a diagram (Figure 1), not structured pseudocode or algorithm blocks.
Open Source Code	Yes	We make our code and models publicly available at https://github.com/bridge-ai-neuro/multi-brain-tuning.
Open Datasets	Yes	For brain-tuning and evaluation, we use the Moth Radio Hour dataset [Le Bel et al., 2024], which is the largest per-participant f MRI dataset that is publicly available. This dataset consists of f MRI recordings of 8 participants who listened to autobiographical stories from the Moth Radio Hour podcast. To test cross-dataset generalization, we use a subset of the Narratives f MRI dataset [Nastase et al., 2021], in which 16 participants listened to a 56-minute fictional short story (with TR = 1.5s).
Dataset Splits	Yes	We split the f MRI stories into 2 validation stories, 1 held-out test story (exclusively used for evaluation and never during training), and the remaining 81 stories for training.
Hardware Specification	Yes	Training takes approx. 6h on two NVIDIA A40 48GB GPUs.
Software Dependencies	Yes	To spatially align participants and be able to parse specific regions of interest (ROIs), we project each participant s data to a common cortical surface with Free Surfer v7.
Experiment Setup	Yes	We use a Lo RA rank = 8, which corresponds to 0.625% of the total model parameters. Increasing the rank beyond 8 did not help the model (Sec.4.4). We used a learning rate of 1 10 4 with a 10% warmup period and a linear decay. We split the f MRI stories into 2 validation stories, 1 held-out test story (exclusively used for evaluation and never during training), and the remaining 81 stories for training. At tuning time, we use a batch size of 128 samples of (audio, f MRI response) pairs (see Sec.3.2) and train the model for 30 epochs.