Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation

Authors: Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai, Ronald Clark, Ming-Hsuan Yang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive evaluations demonstrating our model s effectiveness against state-of-the-art methods on the video relighting task. ... We report 5 metrics for evaluation. The visual quality of the generated videos is evaluated by computing the FVD [48], LPIPS [49] and PSNR [50] scores against the results of the existing methods. Text alignment is quantified as the mean CLIP [51] cosine similarity score between each individual frame and the text prompt. Temporal consistency, on the other hand, is measured as the average CLIP cosine similarity score across every pair of consecutive frames. ... Ablation Studies
Researcher Affiliation	Collaboration	University of Oxford UC Merced NEC Labs America Atmanity Inc. Google DeepMind
Pseudocode	No	The paper describes the model architecture and data collection pipeline through textual descriptions and diagrams (Figure 2, Figure 3), but it does not include explicit pseudocode or algorithm blocks.
Open Source Code	No	We will release the model and curated dataset to support future research.
Open Datasets	Yes	We collect 20,170 high-quality, free-to-use videos from Pexels (https://www.pexels.com/) for training.
Dataset Splits	Yes	We collect 20,170 high-quality, free-to-use videos from Pexels (https://www.pexels.com/) for training. ... For evaluation, we collect an additional 50 high-quality Pexels videos and show the results on this set.
Hardware Specification	Yes	Models are trained for 3,000 iterations on 4 A6000 GPUs... All experiments were conducted on an NVIDIA A6000 GPU.
Software Dependencies	No	The paper mentions several tools and models used, such as Diffusion Light [41], Video Depth Anything [42], Grounded SAM-2 [43], Mat Anyone [44], Light-A-Video [1], Diff Eraser [45], Spatial Tracker [46], CogVLM2-Video LLaMA3-Chat [47], OpenCV, and the AdamW optimizer [52], but it does not specify version numbers for these software dependencies.
Experiment Setup	Yes	Models are trained for 3,000 iterations on 4 A6000 GPUs with a batch size of 2 and gradient accumulation over 4 steps, taking approximately 90 hours. Videos are center-cropped and resized to 720 480 (width height) resolution with 49 frames. We use the Adam W optimizer [52] with a learning rate of 4e-5, weight decay of 0.001, and 100 warmup steps. The learning rate follows a cosine schedule with restarts.