Neural Gaffer: Relighting Any Object via Diffusion
Authors: Haian Jin, Yuan Li, Fujun Luan, Yuanbo Xiangli, Sai Bi, Kai Zhang, Zexiang Xu, Jin Sun, Noah Snavely
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model on both synthetic and in-the-wild Internet imagery and demonstrate its advantages in terms of generalization and accuracy. |
| Researcher Affiliation | Collaboration | 1Cornell Tech, Cornell University 2Zhejiang University 3Adobe Research 4University of Georgia |
| Pseudocode | No | The paper describes its methods through text and diagrams (e.g., Figure 2 and Figure 3) but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | We will release all code and models once upon acceptance. |
| Open Datasets | Yes | We use Objaverse [20] as our data source, which comprises about 800K synthetic 3D object models of varying quality. |
| Dataset Splits | Yes | We select 48 high-quality objects from Objaverse as validation objects, which are unseen during training. We render each object under 4 different camera poses. For each camera, we randomly sample 12 unseen environment maps to render the target relit images, and one additional environment map to render the input. |
| Hardware Specification | Yes | We fine-tune our model for 80K iterations on 8 A6000 GPUs for 5 days. |
| Software Dependencies | No | The paper refers to various models and tools such as 'Cycles renderer from Blender', 'Adam W', 'Zero-1-to-3 model', 'Stable Diffusion', 'Control Net', 'Text2Light', 'SAM', 'Diffusion Light', and 'Tenso RF', but does not provide specific version numbers for software dependencies. |
| Experiment Setup | Yes | We fine-tune our model starting from Zero123 s [42] checkpoint and discard its original linear projection layer for image embedding and pose. We only fine-tune the UNet of the diffusion model and freeze other parts. During fine-tuning, we use a reduced image size of 256 256 and a total batch size of 1024. Both the LDR and normalized HDR environment maps are resized to 256 256. We use Adam W [43] and set the learning rate to 10 4 for training. We fine-tune our model for 80K iterations on 8 A6000 GPUs for 5 days. |