OBJECT 3DIT: Language-guided 3D-aware Image Editing
Authors: Oscar Michel, Anand Bhattad, Eli VanderBilt, Ranjay Krishna, Aniruddha Kembhavi, Tanmay Gupta
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also introduce 3DIT: single and multi-task models for four editing tasks. Our models show impressive abilities to understand the 3D composition of entire scenes, factoring in surrounding objects, surfaces, lighting conditions, shadows, and physically-plausible object configurations. Training on our new benchmark OBJECT, 3DIT remarkably generalizes to images in the CLEVR dataset as well as the real world. |
| Researcher Affiliation | Collaboration | Oscar Michel1 Anand Bhattad2 Eli Vander Bilt1 Ranjay Krishna1,3 Aniruddha Kembhavi1 Tanmay Gupta1 1Allen Institute for Artificial Intelligence, 2University of Illinois Urbana-Champaign, 3University of Washington |
| Pseudocode | No | The paper describes the model architecture and training process in text but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | More information can be found on the project page at https://prior.allenai.org/projects/object-edit. This URL is a project page, not an explicit link to a code repository for the methodology described. |
| Open Datasets | Yes | To promote progress towards this goal, we release OBJECT: a dataset consisting of 400K editing examples created from procedurally generated 3D scenes. |
| Dataset Splits | Yes | We generate 100k training examples for each task, and 1024 scenes for validation and testing. |
| Hardware Specification | Yes | This batch size is achieved by using a local batch size of 64 across 40GB NVIDIA RTX A6000 GPUs, along with two gradient accumulation steps. |
| Software Dependencies | No | The paper mentions using Adam W optimizer and building upon Stable Diffusion and Zero-1-to-3, but does not provide specific version numbers for software dependencies such as Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | Our approach uses an effective batch size of 1024... We train on images with a resolution of 256 256... We utilize the Adam W optimizer, with a learning rate of 1e-4 for all parameters of the model except for those of the concatenation MLP, which uses a learning rate of 1e 3. Our training process runs for a total of 20,000 steps... For inference, we generate images with the DDIM [68] sampler using 200 steps. We do not use classifier-free guidance, i.e. the cfg term is set to 1.0. |