Latent Intrinsics Emerge from Training to Relight
Authors: Xiao Zhang, William Gao, Seemandhar Jain, Michael Maire, David Forsyth, Anand Bhattad
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach produces SOTA relightings of real scenes, as measured by standard metrics. We train our model using the MIT multi-illumination dataset [38], which includes images of 1,015 indoor scenes captured under 25 fixed lighting, totaling 25,375 images. We report the results, measured in RMSE and SSIM, in Table 1. We benchmark our albedo estimates using the WHDR metric on the IIW [5] dataset (Section 2). |
| Researcher Affiliation | Academia | 1University of Chicago 2 University of Illinois Urbana Champaign 3Toyota Technological Institute at Chicago |
| Pseudocode | No | The paper describes the model architecture and training process in text and diagrams, but does not include a formal pseudocode block or algorithm. |
| Open Source Code | Yes | https://latent-intrinsics.github.io/ |
| Open Datasets | Yes | We train our model using the MIT multi-illumination dataset [38], which includes images of 1,015 indoor scenes captured under 25 fixed lighting, totaling 25,375 images. |
| Dataset Splits | No | The paper mentions training on "985 training scenes" and evaluating on a "test set" and a "held-out dataset", but does not explicitly provide details for a validation split (percentages, counts, or specific pre-defined splits). |
| Hardware Specification | Yes | We train our model with 4A40 and a complete training requires 40 hours. |
| Software Dependencies | No | The paper mentions following Karras et al. [25] for sampling standard deviation, but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions). |
| Experiment Setup | Yes | We train our model with a batch size of 256 for 1,000 epochs using the Adam W optimizer, with a constant learning rate of 2e-4 and a weight decay ratio of 1e-2. During training, we add random Gaussian noise to the input image to enhance semantic scene understanding capabilities. |