Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DiffEye: Diffusion-Based Continuous Eye-Tracking Data Generation Conditioned on Natural Images

Authors: Ozgur Kara, Harris Nisar, James M. Rehg

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive evaluation shows that Diff Eye not only achieves state-of-the-art performance in scanpath generation but also enables, for the first time, the generation of continuous eye movement trajectories. We evaluate the results using four standard metrics commonly used in the eye-tracking literature: Levenshtein Distance, Discrete Fréchet Distance (DFD), Dynamic Time Warping (DTW), and Time Delay Embedding (TDE).
Researcher Affiliation	Academia	Ozgur Kara Harris Nisar James M. Rehg EMAIL University of Illinois Urbana-Champaign
Pseudocode	Yes	Algorithm 1 Evaluation of Scanpath and Continuous Trajectory Generation Metrics
Open Source Code	No	We are going to release the code upon acceptance of the paper.
Open Datasets	Yes	We use the MIT1003 dataset [18] to train and evaluate our model. To the best of our knowledge, it is the only publicly available dataset that provides raw eye-tracking data for natural images obtained during a free-viewing task. The dataset contains recordings from 15 subjects who free-viewed 1,003 images for 3 seconds each, resulting in a total of 15,045 eye-tracking sequences. Data was collected using the ETL 400 ISCAN system at a sampling rate of 240 Hz. https://people.csail.mit.edu/tjudd/Where People Look/index.html. We evaluate our results on the test sets of the MIT1003 and OSIE [53] datasets. https://github.com/chenxy99/Scanpaths/tree/main/OSIE
Dataset Splits	Yes	The dataset is split into training (90%) and testing (10%) sets based on stimuli, ensuring that images used for evaluation were not seen during training.
Hardware Specification	Yes	All experiments were conducted using an NVIDIA RTX A6000 GPU.
Software Dependencies	No	The paper mentions several models and optimizers (e.g., Adam optimizer, DDPM, DDIM), but does not specify software libraries or their version numbers used for implementation, such as Python, PyTorch, or TensorFlow versions.
Experiment Setup	Yes	We train our model using the Adam optimizer [65] for 3000 epochs with a fixed learning rate of 1 10 4. During training, we utilize the DDPM scheduler with a linear noise schedule ranging from 1 10 4 to 2 10 2. For sampling, we adopt Denoising Diffusion Implicit Models (DDIM) [66], which enable high-quality sample generation in significantly fewer steps without compromising performance. We set the number of diffusion steps to Tdiff = 1000 during training and reduce it to 50 during sampling for improved efficiency. Throughout both training and inference, we apply classifier-free guidance (CFG) to effectively condition the model on the visual stimulus [67]. In Appendix A.1: where c is the classifier-free guidance scale, which we set to 4 during inference. To enable this, we simulate the unconditional setting during training by randomly replacing the conditioning input I with a zero matrix in 10% of the training samples.