Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

LuxDiT: Lighting Estimation with Video Diffusion Transformer

Authors: Ruofan Liang, Kai He, Zan Gojcic, Igor Gilitschenski, Sanja Fidler, Nandita Vijaykumar, Zian Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that Lux Di T produces accurate, high-frequency, and sceneconsistent lighting predictions from limited visual input. Table 1 reports quantitative comparisons on three benchmarks spanning both indoor and outdoor scenes.
Researcher Affiliation Collaboration 1NVIDIA 2University of Toronto 3Vector Institute
Pseudocode No The paper only describes steps in regular paragraph text with mathematical equations, but does not contain structured pseudocode or algorithm blocks.
Open Source Code No We plan to release the code and data upon acceptance. The internal guidelines of our institution prevent us from releasing code at this stage.
Open Datasets Yes We evaluate our method on the following three benchmark datasets, covering various indoor and outdoor scenes. 1) Laval Indoor [15]: We use the same set of 289 test HDRIs used by prior works [45, 55]; 2) Laval Outdoor [23]: We evaluate on 116 sunny HDR panoramas with concentrated sunlight selected from the original dataset; 3) Poly Haven [67]: We select 181 Poly Haven HDRIs not used during model training to evaluate performance across both indoor and outdoor scenes. We use 2,000 panoramic videos from the WEB360 dataset [57] for training, and hold out 114 videos for evaluation.
Dataset Splits Yes 1) Laval Indoor [15]: We use the same set of 289 test HDRIs used by prior works [45, 55]; 2) Laval Outdoor [23]: We evaluate on 116 sunny HDR panoramas with concentrated sunlight selected from the original dataset; 3) Poly Haven [67]: We select 181 Poly Haven HDRIs not used during model training to evaluate performance across both indoor and outdoor scenes. We use 2,000 panoramic videos from the WEB360 dataset [57] for training, and hold out 114 videos for evaluation.
Hardware Specification Yes All training is conducted on 16 NVIDIA A100 GPUs.
Software Dependencies No The paper does not provide specific version numbers for key software components or libraries, only mentioning general tools like Blender and OptiX without version details, and a pre-trained model (Cog Video X) as a backbone.
Experiment Setup Yes Input resolutions are randomly sampled between 512 512 and 480 720, and output environment map resolutions are between 128 256 and 256 512. The image-based model is trained with a batch size of 192 for 12,000 iterations. For video training, we use the same spatial resolutions and uniformly sample frame lengths from 9, 17, 25. The video model is trained with an average batch size of 48 for an additional 12,000 iterations. Lo RA modules are applied to all attention layers with a rank of 64. We fine-tune the Lo RA parameters for 5,000 iterations during the adaptation stage.