Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Generalizable, real-time neural decoding with hybrid state-space models

Authors: Avery Hee-Woon Ryoo, Nanda H Krishna, Ximeng Mao, Mehdi Azabou, Eva L Dyer, Matthew G Perich, Guillaume Lajoie

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate POSSM s decoding performance and inference speed on intracortical decoding of monkey motor tasks, and show that it extends to clinical applications, namely handwriting and speech decoding in human subjects.
Researcher Affiliation	Academia	1Mila Quebec AI Institute 2Université de Montréal 3Columbia University 4University of Pennsylvania 5Canada CIFAR AI Chair
Pseudocode	No	The paper describes the architecture and methods in detailed prose and uses diagrams (e.g., Figure 2), but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code will be available publicly through the torch_brain Python package. Additional details and configurations will be linked on our project page (https://possm-brain.github.io).
Open Datasets	Yes	The pretraining dataset includes four NHP reaching datasets collected by different laboratories [28 31], covering three types of reaching tasks: centre-out (CO), random target (RT), and Maze. Next, we evaluated POSSM on a human handwriting dataset [23]. Finally, we evaluated POSSM on the task of human speech decoding. Unlike the reaching and handwriting tasks, which involved fixed-length context windows, speech decoding involves modelling variable-length phoneme sequences that depend on both the length of the sentence and the individual s speaking pace. We used a public dataset [3, 33]
Dataset Splits	Yes	Per session (across all datasets), 10% of the trials were used for validation and 20% were used for testing. The remaining data, including inter-trial segments, was used for training.
Hardware Specification	Yes	Inference times are computed on a workstation-class GPU (NVIDIA RTX8000). For all these results, we used a GRU backbone for POSSM. ... These results held in a CPU environment (AMD EPYC 7502 32-Core) as well... Single-session models were trained on a single NVIDIA RTX8000 GPU using LAMB with a batch size of 128, a base learning rate of 0.004, and a cosine scheduler. Multi-dataset pretraining was done on four NVIDIA H100 GPUs using LAMB with a batch size of 256... Both the uni-directional and bi-directional versions of POSSM were trained on a single NVIDIA A100 GPU (80GB) with a batch size of 16.
Software Dependencies	No	The paper mentions several software components like 'torch_brain Python package', 'Adam W optimizer', and 'LAMB optimizer', but does not provide specific version numbers for any of these to enable reproducibility of the software environment.
Experiment Setup	Yes	As the prediction target is a two-dimensional time series of either hand or cursor velocity, the loss function was chosen to be mean squared error. We increased the weight of the loss for centre-out reaching segments by a factor of 5, following POYO. ... All baselines were trained using the Adam W optimizer [68] with a batch size of 128, a base learning rate of 0.004, and a cosine scheduler for a total of 500 epochs. ... The model was trained using the LAMB optimizer [69] with a batch size of 128, base learning rates of 0.004 and 0.002 for single-session and pretrained models respectively, and a cosine scheduler for a total of 500 epochs. During training, we also applied a data augmentation scheme called unit dropout [12, 70]... For full finetuning, unit identification was first performed for 100 epochs before the rest of the model was unfrozen and trained for another 400 epochs. All models were trained for a total of 500 epochs.