Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

LUNA: Efficient and Topology-Agnostic Foundation Model for EEG Signal Analysis

Authors: Berkay Döner, Thorir Mar Ingolfsson, Luca Benini, Yawei Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	LUNA transfers effectively to four downstream tasks: abnormality detection, artifact rejection, slowing classification, and emotion recognition. It demonstrates highly competitive performance across several benchmarks, achieving state-of-the-art results on TUAR and TUSL, e.g., 0.921 AUROC on TUAR, while reducing FLOPs by 300 and trimming GPU memory use by up to 10 . 4.1 Experimental Setup 4.2 Downstream Task Performance 4.3 Computational Efficiency 4.4 Ablation Studies 4.5 Latent Space Analysis
Researcher Affiliation	Academia	Berkay Döner1 Thorir Mar Ingolfsson1 Luca Benini1 Yawei Li1 1Integrated Systems Laboratory, ETH Zürich, Switzerland Correspondence to EMAIL.
Pseudocode	No	The paper describes the methodology and architecture in Section 3 and its subsections, but it does not include any explicitly labeled pseudocode or algorithm blocks. Figure 1 provides an overview diagram, not pseudocode.
Open Source Code	Yes	Code is available at https://github.com/pulp-bio/biofoundation
Open Datasets	Yes	We pre-train LUNA on a combined corpus of Temple University Hospital EEG Corpus (TUEG) [9] and the Siena Scalp EEG Database [10], spanning recordings with 20, 22, and 29 channels amounting to over 21,900 hours of EEG data (see Table 11).
Dataset Splits	Yes	For the TUAB dataset, we use the official train-test split. As TUSL and TUAR lack official subject-wise test splits, we follow recent leading work (e.g., EEGFormer [14]) and adopt an 80%/10%/10% randomized sample-level split for train/val/test to allow direct, like-for-like comparison. For SEED-V, fifteen trials are divided equally into train, validation, and test sets for each session.
Hardware Specification	Yes	All experiments were conducted on a cluster of eight NVIDIA A100 GPUs, using Python 3.11.6 and Py Torch 2.4.1 with CUDA 12.1. Training utilizes bf16 mixed-precision.
Software Dependencies	Yes	All experiments were conducted on a cluster of eight NVIDIA A100 GPUs, using Python 3.11.6 and Py Torch 2.4.1 with CUDA 12.1. Training utilizes bf16 mixed-precision.
Experiment Setup	Yes	For LUNA, the input EEG is segmented into patches, consisting of 40 timestamps. For most datasets, EEG recordings are sliced into non-overlapping 5-second segments to form individual training/evaluation samples. SEED-V dataset uses its default 1-second sample duration. We optimize binary cross-entropy loss for TUAB and crossentropy loss for other datasets. We report the mean and standard deviation of results obtained across three different random seeds. Preprocessing We apply a minimal, standardized preprocessing pipeline to all EEG data. Signals are first bandpass filtered between 0.1 Hz and 75 Hz. A notch filter (50Hz or 60Hz) is applied to remove power-line interference. All signals are then resampled to 256 Hz. For TUEG, TUAB, TUAR, and TUSL datasets we construct a bipolar ( double-banana ) montage by differencing predefined longitudinal electrode pairs provided in the dataset documentation; the full list of channel pairs used is given in Appendix A.7. Siena and SEED-V are processed in unipolar format. Finally, each channel within each sample is normalized using z-score normalization. Detailed hyperparameters for pre-training and fine-tuning are provided in Appendix A.3.