Neural Gaffer: Relighting Any Object via Diffusion

Authors: Haian Jin, Yuan Li, Fujun Luan, Yuanbo Xiangli, Sai Bi, Kai Zhang, Zexiang Xu, Jin Sun, Noah Snavely

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our model on both synthetic and in-the-wild Internet imagery and demonstrate its advantages in terms of generalization and accuracy.
Researcher Affiliation Collaboration 1Cornell Tech, Cornell University 2Zhejiang University 3Adobe Research 4University of Georgia
Pseudocode No The paper describes its methods through text and diagrams (e.g., Figure 2 and Figure 3) but does not include formal pseudocode or algorithm blocks.
Open Source Code Yes We will release all code and models once upon acceptance.
Open Datasets Yes We use Objaverse [20] as our data source, which comprises about 800K synthetic 3D object models of varying quality.
Dataset Splits Yes We select 48 high-quality objects from Objaverse as validation objects, which are unseen during training. We render each object under 4 different camera poses. For each camera, we randomly sample 12 unseen environment maps to render the target relit images, and one additional environment map to render the input.
Hardware Specification Yes We fine-tune our model for 80K iterations on 8 A6000 GPUs for 5 days.
Software Dependencies No The paper refers to various models and tools such as 'Cycles renderer from Blender', 'Adam W', 'Zero-1-to-3 model', 'Stable Diffusion', 'Control Net', 'Text2Light', 'SAM', 'Diffusion Light', and 'Tenso RF', but does not provide specific version numbers for software dependencies.
Experiment Setup Yes We fine-tune our model starting from Zero123 s [42] checkpoint and discard its original linear projection layer for image embedding and pose. We only fine-tune the UNet of the diffusion model and freeze other parts. During fine-tuning, we use a reduced image size of 256 256 and a total batch size of 1024. Both the LDR and normalized HDR environment maps are resized to 256 256. We use Adam W [43] and set the learning rate to 10 4 for training. We fine-tune our model for 80K iterations on 8 A6000 GPUs for 5 days.