Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Brain-Like Processing Pathways Form in Models With Heterogeneous Experts

Authors: Jack Cook, Danyal Akarca, Rui Costa, Jascha Achterberg

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we use an extension of the Heterogeneous Mixture-of-Experts architecture to show that heterogeneous regions do not form processing pathways by themselves, implying that the brain likely implements specific constraints which result in the reliable formation of pathways. We identify three biologically relevant inductive biases that encourage pathway formation: a routing cost imposed on the use of more complex regions, a scaling factor that reduces this cost when task performance is low, and randomized expert dropout. When comparing our resulting Mixtureof-Pathways model with the brain, we observe that the artificial pathways in our model match how the brain uses cortical and subcortical systems to learn and solve tasks of varying difficulty. ... We train our models over 10 epochs, each containing 1000 training steps. At each training step, models are given a 128 350 115 matrix of input data, representing 128 batches of task sequences that are 350 timesteps long, with 115 features at each timestep. Models are trained with a cross entropy loss Lresponse,i for the correct response during the response period of task i.
Researcher Affiliation Academia Jack Cook1 Danyal Akarca2 Rui Ponte Costa1, Jascha Achterberg1, 1Centre for Neural Circuits and Behaviour, University of Oxford 2Department of Electrical and Electronic Engineering, Imperial College London Joint senior authors
Pseudocode Yes Algorithm 1: Mixture-of-Pathways training protocol. Full details in Appendix A.1.
Open Source Code Yes We provide our implementation at https://github.com/jackcook/mixture-of-pathways.
Open Datasets Yes To evaluate how pathways are formed and used across tasks with different characteristics, we use the Mod-Cog task set, which contains 82 time-series-based cognitive tasks [24]. This is an expansion of the popular Neuro Gym framework [48]... Neurogym: An open resource for developing and sharing neuroscience tasks, 2022. OSF.
Dataset Splits No We train our models over 10 epochs, each containing 1000 training steps. At each training step, models are given a 128 350 115 matrix of input data, representing 128 batches of task sequences that are 350 timesteps long, with 115 features at each timestep. These task sequences contain many individual tasks: the average task is about 20 timesteps long, meaning that in each batch, models observe about 27 trials of each task. We evaluate each model on 50 trials per task.
Hardware Specification Yes Training one model takes roughly 1 hour on a single NVIDIA T4 GPU.
Software Dependencies No All our simulations are implemented with Py Torch, as described in the extended implementation details in the appendix. All models are optimized with the Schedule-Free variant of the Adam W optimizer [47] using a learning rate of 0.01, betas of (0.9, 0.999), and no weight decay.
Experiment Setup Yes We train our models over 10 epochs, each containing 1000 training steps. ... All models are optimized with the Schedule-Free variant of the Adam W optimizer [47] using a learning rate of 0.01, betas of (0.9, 0.999), and no weight decay. ... We found that setting α = 10-5 balanced these priorities well, and used this value for all of the experiments in this work. ... we train 11 groups of models with β values ranging from 0, 0.1, ..., 0.9, 1 using Equation 3, with 10 models in each group.