SceneCraft: Layout-Guided 3D Scene Generation
Authors: Xiuyu Yang, Yunze Man, Junkun Chen, Yu-Xiong Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experimental analysis, we demonstrate that our method significantly outperforms existing approaches in complex indoor scene generation with diverse textures, consistent geometry, and realistic visual quality.Trained with multi-view indoor scene datasets [49, 72], our work achieves state-of-the-art 3D indoor scene generation performance, both quantitatively and qualitatively. |
| Researcher Affiliation | Academia | 1 Shanghai Jiao Tong University 2 University of Illinois Urbana-Champaign |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | We promise that we will open-source the data and code after paper acceptance. |
| Open Datasets | Yes | We use multi-view images from Scan Net++ [72] and Hyper Sim [49] to construct BBI data.Our processed data are publicly available 1 2. 1Layout Scannet++: https://huggingface.co/datasets/gzzyyxy/layout_diffusion_scannetpp_voxel0.2 2Layout Hypersim: https://huggingface.co/datasets/gzzyyxy/layout_diffusion_hypersim |
| Dataset Splits | No | The paper mentions splitting generation tasks but does not provide specific percentages or counts for train/validation/test splits of the datasets (ScanNet++ and Hypersim). |
| Hardware Specification | Yes | For finetuning the diffusion model, we use a total batch size of 16 on 2 NVIDIA A6000 GPUs with a constant learning rate of 5e-5, training for around 10k iterations. For the scene generation task, we use 2 A6000 GPUs to perform all our experiments. |
| Software Dependencies | No | The paper mentions software like Stable Diffusion, NeRFStudio, and Control Nets but does not provide specific version numbers for these or other dependencies. |
| Experiment Setup | Yes | For finetuning the diffusion model, we use a total batch size of 16 on 2 NVIDIA A6000 GPUs with a constant learning rate of 5e-5, training for around 10k iterations. For Ne RF training, we use a constant learning rate of 1e-2 for proposal networks and 1e-3 for fields. |