Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Is Limited Participant Diversity Impeding EEG-based Machine Learning?

Authors: Philipp Bomatter, Henry Gouk

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We address these challenges by systematically investigating the data scaling behaviour of different EEG machine learning models (TCN, m Att, La Bra M) across a range of large datasets (TUAB, CAUEEG, Physio Net), each with more than 1,000 participants, and three tasks (EEG normality prediction, dementia diagnosis, sleep staging). Crucially, we control both the number of participants in the training data and the amount of data per participant, allowing us to disentangle the impact of participant diversity from the overall sample size.
Researcher Affiliation	Academia	Philipp Bomatter School of Informatics University of Edinburgh EMAIL Henry Gouk School of Informatics University of Edinburgh EMAIL
Pseudocode	No	The paper describes experimental procedures and model architectures in detail through text, figures, and tables, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code for our experiments is publicly available online.1 ... 1https://github.com/bomatter/participant-diversity-paper ... The research code for our experiments is publicly available at https://github.com/bomatter/ participant-diversity-paper.
Open Datasets	Yes	Physio Net The dataset referred to as Physio Net in this study corresponds to the data used in the Physio Net/Computing in Cardiology Challenge 2018 [28, 29]... The data is available under the Open Data Commons Attribution License (ODC-By) v1.0.
Dataset Splits	Yes	Each dataset was split into train, validation, and test splits with no participant overlap. Subsequently, we randomly subsampled training datasets with a fixed number of participants and overall sample size from the train split as visualised by the grid depicted in Figure 1B... The official train-test splits were used for TUAB and CAUEEG, and in both cases, part of the train split was set aside for validation. For Physio Net, where labels for the official test split are not publicly available, an age and sex stratified split into train, val, and test was created on the participant level (i.e. without overlapping participants). The exact splits can be reproduced with the shared research code.
Hardware Specification	Yes	All models were trained on an internal cluster with RTX 2080 Ti (11GB) and A40 (48GB) GPUs.
Software Dependencies	No	Dependencies notably include Py Torch [49], MNE-Python [50] and MNE-BIDS [51] for data harmonisation and preprocessing, Torch EEG [52] for the implementation of the data loader, Braindecode [31] for the implementation of the TCN model and data augmentations, and the repositories of the original La Bra M [15] and m Att [32] models. While software names are listed, specific version numbers (e.g., PyTorch 1.9) for these dependencies are not explicitly provided in the text.
Experiment Setup	Yes	Models were trained using Adam W (betas=(0.9, 0.999), weight_decay=0.01), global gradient norm clipping (max_norm = 1.0), and a cross-entropy loss. The learning rate was set to 1e-3 for m Att and TCN and tuned for La Bra M (see Section 3.6 and Table 1 for details and the used learning rates respectively). For the CAUEEG dataset, we used class weights in the loss function to account for class imbalance. Since the amount and diversity of the training data differed substantially between trials, early stopping (on the fixed validation set) was used to prevent overfitting, while allowing sufficient training time if the amount of data was larger. Specifically, the validation loss was evaluated every 500 batches and a patience of 5 was used to trigger early stopping.