Latent Intrinsics Emerge from Training to Relight

Authors: Xiao Zhang, William Gao, Seemandhar Jain, Michael Maire, David Forsyth, Anand Bhattad

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach produces SOTA relightings of real scenes, as measured by standard metrics. We train our model using the MIT multi-illumination dataset [38], which includes images of 1,015 indoor scenes captured under 25 fixed lighting, totaling 25,375 images. We report the results, measured in RMSE and SSIM, in Table 1. We benchmark our albedo estimates using the WHDR metric on the IIW [5] dataset (Section 2).
Researcher Affiliation Academia 1University of Chicago 2 University of Illinois Urbana Champaign 3Toyota Technological Institute at Chicago
Pseudocode No The paper describes the model architecture and training process in text and diagrams, but does not include a formal pseudocode block or algorithm.
Open Source Code Yes https://latent-intrinsics.github.io/
Open Datasets Yes We train our model using the MIT multi-illumination dataset [38], which includes images of 1,015 indoor scenes captured under 25 fixed lighting, totaling 25,375 images.
Dataset Splits No The paper mentions training on "985 training scenes" and evaluating on a "test set" and a "held-out dataset", but does not explicitly provide details for a validation split (percentages, counts, or specific pre-defined splits).
Hardware Specification Yes We train our model with 4A40 and a complete training requires 40 hours.
Software Dependencies No The paper mentions following Karras et al. [25] for sampling standard deviation, but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup Yes We train our model with a batch size of 256 for 1,000 epochs using the Adam W optimizer, with a constant learning rate of 2e-4 and a weight decay ratio of 1e-2. During training, we add random Gaussian noise to the input image to enhance semantic scene understanding capabilities.