Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Self-Calibrating BCIs: Ranking and Recovery of Mental Targets Without Labels

Authors: Jonathan Grizou, Carlos De la Torre-Ortiz, Tuukka Ruotsalo

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on naturalistic images of faces demonstrate that CURSOR can (1) predict image similarity scores that correlate with human perceptual judgments without any label information, (2) use these scores to rank stimuli against an unknown mental target, and (3) generate new stimuli indistinguishable from the unknown mental target (validated via a user study, N = 53).
Researcher Affiliation	Collaboration	Jonathan Grizou Griz AI University of Glasgow EMAIL Carlos de la Torre-Ortiz* University of Helsinki EMAIL Tuukka Ruotsalo LUT University University of Copenhagen EMAIL
Pseudocode	Yes	Algorithm 1 CURSOR: Scoring Function
Open Source Code	Yes	Code and Data https://github.com/jgrizou/neurips-self-calibrating-bci/
Open Datasets	Yes	We release the brain response data set (N = 29), associated face images used as stimuli data, and a codebase to initiate further research on this novel task. Code and Data https://github.com/jgrizou/neurips-self-calibrating-bci/
Dataset Splits	Yes	Each S(h) estimate is the average of a 10-fold 90%/10% randomly partitioned cross-validation procedure.
Hardware Specification	Yes	The computations were performed locally on GNU/Linux and Mac OS-based machines: A Lenovo Thinkpad t480 running Ubuntu 22.04, an i5-7300U Intel CPU at a base frequency of 2.6 GHz, Intel HD Graphics 620, and 32GB / 2400MHz DDR4 RAM; A custom machine running Linux 6.10.3, an AMD Ryzen 7 7840U 16 core at a base frequency of 3.3 GHz with Radeon 780M Graphics, and 64 GB / DDR5-5600 RAM; A Mac Book Pro running Mac OS Sonoma 14.3, an Apple M1 Max with 10 cores and 64GB RAM. Most computationally intensive tasks were executed on an on-premises Alma Linux 8.7 cluster with a SLURM scheduler and AMD EPYC 7452 CPUs.
Software Dependencies	No	There are no specific package version requirements. We use Python 3.11, the oldest stable version at submission time supporting advanced enumerations for quality of life improvements, and therefore dependency versions are flexible. Details on the packages can be found in the dependencies file in the code.
Experiment Setup	Yes	We compared performance using Linear Regression (LR), Support Vector Regression (SVR), and Multi-Layer Perceptron (MLP) as implemented in the Scikit-learn library [56]. The relative model simplicity allows for rigorous evaluation within our computational constraints, and linear models are known to perform well in EEG tasks [55, 69, 9, 42, 8]. When training estimator parameters θ, both EEG e and distances d are standardized by removing the mean and scaling to unit variance for each feature. ... For the Best MLP model, identity activation generally outperformed ReLU, with moderate alpha values (0.1) yielding better results. Simpler architectures (2-3 layers) often performed comparably to or better than more complex ones. The best-performing configuration consisted of identity activation, alpha = 0.1, and two hidden layers of size 100 with an adaptive learning rate.