Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ROGR: Relightable 3D Objects using Generative Relighting

Authors: Jiapeng Tang, Matthew Levine, Dor Verbin, Stephan Garbin, Matthias Niessner, Ricardo Martin Brualla, Pratul P. Srinivasan, Philipp Henzler

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on the established Tenso IR and Stanford-ORB datasets, where it improves upon the state-of-the-art on most metrics, and showcase our approach on real-world object captures.
Researcher Affiliation Collaboration Jiapeng Tang1,3 Matthew Levine1 Dor Verbin2 Stephan J. Garbin1 Matthias NieรŸner3 Ricardo Martin-Brualla1 Pratul P. Srinivasan2 Philipp Henzler1 1 Google Research 2 Google Deepmind 3 Technical University of Munich
Pseudocode No The paper describes the methodology in prose and architectural diagrams (e.g., Figure 2, Figure 8), but does not present any structured pseudocode or algorithm blocks.
Open Source Code No Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We have not cleaned and released our data and code.
Open Datasets Yes Training datasets. To train multi-view relighting diffusion, we use a dataset of 400k synthetic 3D objects, including 100k from Objaverse [46]. Each object is rendered in 64 views 16 HDR illuminations, with environments sampled from Polyhaven [47] (590 maps) and augmented via random up-axis rotations. Evaluation datasets. For relighting evaluations, we used two datasets: Tenso IR [45] and Stanford ORB [48].
Dataset Splits Yes Training datasets. To train multi-view relighting diffusion, we use a dataset of 400k synthetic 3D objects, including 100k from Objaverse [46]. Each object is rendered in 64 views 16 HDR illuminations, with environments sampled from Polyhaven [47] (590 maps) and augmented via random up-axis rotations. Evaluation datasets. For relighting evaluations, we used two datasets: Tenso IR [45] and Stanford ORB [48]. Tenso IR is a synthetic benchmark, which contains renderings of four objects under six lighting conditions. We use the train split of 100 views with sunset lighting condition as inputs for relightable Ne RF. We then evaluate 200 novel views under other five environment maps, including bridge , fireplace , forest , city , and night . In total, we have 4,000 renderings for evaluation metric calculation. Stanford-ORB is real-world benchmarks by data capture in the wild. It has 14 objects composed of various materials. Each object is captured under three distinct lighting conditions, producing a total of 42 (object, lighting) combinations. Following its evaluation protocol, we use images of an object under a single lighting condition and evaluate novel views under the two target lighting settings.
Hardware Specification Yes The model was trained on 128 TPU v5 chips using a learning rate of 10 4, with a total batch size of 128 for 360k iterations. After training, we generate the multi-illumination dataset by running our relighting diffusion inference on 111 environment maps. Relightable Ne RF model. We train our Ne RF on 8 H100s for 500k steps.
Software Dependencies No We implement our multi-view relighting diffusion model using JAX [58]. It is initialized from a pre-trained latent diffusion model for text-to-image generation, similar to Stable Diffusion [49]. The paper mentions software tools like JAX and Stable Diffusion but does not provide specific version numbers for these dependencies.
Experiment Setup Yes The model was trained on 128 TPU v5 chips using a learning rate of 10 4, with a total batch size of 128 for 360k iterations. After training, we generate the multi-illumination dataset by running our relighting diffusion inference on 111 environment maps. Relightable Ne RF model. We train our Ne RF on 8 H100s for 500k steps. We use a 512 512 resolution environment map as the target illumination. We sample each reflection rays 3 times; once using a point sample on the full resolution environment map, and then using Gaussian kernels of sizes 20 20 and 40 40 pixels in radius with ฯƒi values of 10 and 20 respectively (see Fig. 3). In order to maximize the number of Illumination conditions we use for training, we make several reductions to the size of model relative to the Ne RF-Casting architecture. We lower the batch size to 1,000 and increase the number of training steps to 500,000. We also decrease the size of the bottleneck vector b in both the geometry and appearance MLPs relative to Ne RF-Casting. Please refer to supplementary material for more details on the base architecture. During training, we use the DDPM schedule, with beta values that linearly increase from 8.5 10 4 to 1.2 10 2 over 1024 steps. We use noise prediction as our diffusion objective. The model was trained on 128 TPU v5 chips using a learning rate of 10 4, with a total batch size of 128 for 360k iterations and 10K warm-up steps. We adopted a progressive training scheme, where we first trained a 4-view diffusion model for 300k steps, and then fine-tuned it for 16-view diffusion for 15k steps, and finally fine-tuned it for 64-view diffusion for 45 steps. We keep the learning rate as 10 4 when we fine-tune the model to relight the large number of views. We enable classifier-free guidance (CFG) [60] by randomly dropping the HDR and LDR environment maps with a probability of 0.1. During inference, we use the DDIM schedule [61] with 50 sampling steps and the classifier-free weight is set to 3.0.