Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ZEBRA: Towards Zero-Shot Cross-Subject Generalization for Universal Brain Visual Decoding

Authors: Haonan Wang, Jingyu Lu, Hongrui Li, Xiaomeng Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that ZEBRA signiﬁcantly outperforms zero-shot baselines and achieves performance comparable to fully ﬁnetuned models on several metrics. Quantitative Results. We evaluate ZEBRA against representative methods across various training regimes on the Natural Scenes Dataset, with results averaged over subjects 1, 2, 5, and 7. We conduct ablation studies on Subject 1 (trained on Subjects 2-8) to assess the contribution of each component in ZEBRA.
Researcher Affiliation	Academia	Haonan Wang, Jingyu Lu, Hongrui Li, Xiaomeng Li The Hong Kong University of Science and Technology EMAIL, EMAIL
Pseudocode	No	The paper describes methods in text and uses figures (e.g., Figure 2: Core idea of ZEBRA, Figure 3: ZEBRA consists of two key components) to illustrate the architecture and flow, but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and model weights are available at: https://github.com/xmed-lab/ZEBRA.
Open Datasets	Yes	Dataset. We use the Natural Scenes Dataset (NSD) [17] for both training and evaluation. NSD contains visual image stimulus and corresponding f MRI recordings of 8 subjects, with each subject viewing 8,000-9,000 images. [17] E. J. Allen, G. St-Yves, Y. Wu, J. L. Breedlove, J. S. Prince, L. T. Dowdle, M. Nau, B. Caron, F. Pestilli, I. Charest, et al., A massive 7t fmri dataset to bridge cognitive neuroscience and artiﬁcial intelligence, Nature neuroscience, vol. 25, no. 1, pp. 116 126, 2022.
Dataset Splits	Yes	For each test subject, we use all other 7 subjects to train the model and tested on the unseen subject with unseen test split. The ﬁnal results were tested on subjects 1, 2, 5 or 7, since these subjects complete all scanning sessions, sharing the same 982 images as testing data.
Hardware Specification	Yes	All experiments were conducted for 60 epochs using 8 NVIDIA RTX H800 GPUs with a total batch size of 128 (16 samples per GPU).
Software Dependencies	No	The paper mentions using specific optimizers like Adam W and generative models like SDXL un CLIP, but does not provide specific version numbers for software libraries or frameworks such as Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	All experiments were conducted for 60 epochs using 8 NVIDIA RTX H800 GPUs with a total batch size of 128 (16 samples per GPU). We adopt the Adam W optimizer [48] with a learning rate of 1e-4, following the One Cycle learning rate schedule [49]. In the inference stage, we follow Mind Eye2 s two-stage decoding process. First, the predicted image latents are decoded into coarse images using SDXL un CLIP. These coarse outputs are then reﬁned using base SDXL in image-to-image mode, guided by predicted captions. The reﬁnement starts from a noised version of the coarse image, skipping the ﬁrst 50% of diffusion steps. ϑ is set to 30 following previous methods [4].