Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Neural-Driven Image Editing

Authors: Pengfei Zhou, Jie Xia, Xiaopeng Peng, Wangbo Zhao, Zilong Ye, Zekai Li, Suorong Yang, Jiadong Pan, Yuanxiang Chen, Ziqiao Wang, Kai Wang, Qian Zheng, Xiaojun Chang, Gang Pan, Shurong Dong, Kaipeng Zhang, Yang You

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experiment To answer the research questions (RQs) asked in Sec. 1, we conduct a comprehensive evaluation to validate the effectiveness of Loong X on the test set of L-Mind. This section first describes the experimental setup, evaluation metrics, and implementation details, and presents results of comprehensive quantitative evaluations, detailed breakdown analyses, and qualitative assessments.
Researcher Affiliation	Collaboration	1NUS 2ZJU 3RIT 4NJU 5USTC 6Shanghai AI Lab 7SII 8Hangzhou Rong Nao Tech NUS, ZJU, RIT, NJU, USTC are universities; Shanghai AI Lab is a research institution; Hangzhou Rong Nao Tech appears to be an industry entity. This mix indicates a collaborative affiliation.
Pseudocode	No	The paper describes methods in text and uses mathematical equations, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block with structured steps. Figure 2 is a data flow diagram, not pseudocode.
Open Source Code	Yes	The code and dataset are released on the project website: https://loongx1.github.io.
Open Datasets	Yes	To answer these questions, we construct L-Mind, a comprehensive multimodal dataset comprising 23,928 image pairs with synchronously collected EEG, functional near infrared spectroscopy (f NIRS) [33], photoplethysmography (PPG), head motion, and speech signals from 12 participants conceiving image editing tasks.
Dataset Splits	Yes	We collect 23,928 editing samples (22,728 training, 1,200 testing) from 12 participants using the setup depicted in Fig. 2.
Hardware Specification	Yes	All models are trained on eight NVIDIA H100 GPUs. Text prompts are embedded by T5-XXL [59] and CLIP [60]; neural signal streams are encoded by the proposed CS3. Unless stated otherwise, EEG montage (Fz, Fp2, O2, Pz, Cz) is sampled at 256 Hz and down-sampled to 32 Hz after band-pass filtering. Inference runs at 8 steps with classifier-free guidance w = 4. We choose Omini Control [48] as our baseline as it supports the text-conditioned image-editing based on Di Ts. We also implement Loong X using only neural signals (EEG, f NIRS, PPG and Motion) and using both text prompts and neural signals. We load the pretrained weights from FLUX.1-dev2 and use low-rank approximation (Lo RA) for fine-tuning (learning rate 1.0, weight decay 0.01).
Software Dependencies	No	The paper mentions specific models like T5-XXL and CLIP, and frameworks like Diffusion Transformer. It also refers to Lab Recorder software. However, it does not provide specific version numbers for the programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or other core software components with versions that would be necessary to replicate the environment.
Experiment Setup	Yes	Unless stated otherwise, EEG montage (Fz, Fp2, O2, Pz, Cz) is sampled at 256 Hz and down-sampled to 32 Hz after band-pass filtering. Inference runs at 8 steps with classifier-free guidance w = 4. We choose Omini Control [48] as our baseline as it supports the text-conditioned image-editing based on Di Ts. We load the pretrained weights from FLUX.1-dev2 and use low-rank approximation (Lo RA) for fine-tuning (learning rate 1.0, weight decay 0.01).