reproducibilityindex.ai

Latent Intrinsics Emerge from Training to Relight

Authors: Xiao Zhang, William Gao, Seemandhar Jain, Michael Maire, David Forsyth, Anand Bhattad

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our approach produces SOTA relightings of real scenes, as measured by standard metrics. We train our model using the MIT multi-illumination dataset [38], which includes images of 1,015 indoor scenes captured under 25 fixed lighting, totaling 25,375 images. We report the results, measured in RMSE and SSIM, in Table 1. We benchmark our albedo estimates using the WHDR metric on the IIW [5] dataset (Section 2).
Researcher Affiliation	Academia	1University of Chicago 2 University of Illinois Urbana Champaign 3Toyota Technological Institute at Chicago
Pseudocode	No	The paper describes the model architecture and training process in text and diagrams, but does not include a formal pseudocode block or algorithm.
Open Source Code	Yes	https://latent-intrinsics.github.io/
Open Datasets	Yes	We train our model using the MIT multi-illumination dataset [38], which includes images of 1,015 indoor scenes captured under 25 fixed lighting, totaling 25,375 images.
Dataset Splits	No	The paper mentions training on "985 training scenes" and evaluating on a "test set" and a "held-out dataset", but does not explicitly provide details for a validation split (percentages, counts, or specific pre-defined splits).
Hardware Specification	Yes	We train our model with 4A40 and a complete training requires 40 hours.
Software Dependencies	No	The paper mentions following Karras et al. [25] for sampling standard deviation, but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup	Yes	We train our model with a batch size of 256 for 1,000 epochs using the Adam W optimizer, with a constant learning rate of 2e-4 and a weight decay ratio of 1e-2. During training, we add random Gaussian noise to the input image to enhance semantic scene understanding capabilities.