SceneCraft: Layout-Guided 3D Scene Generation

Authors: Xiuyu Yang, Yunze Man, Junkun Chen, Yu-Xiong Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experimental analysis, we demonstrate that our method significantly outperforms existing approaches in complex indoor scene generation with diverse textures, consistent geometry, and realistic visual quality.Trained with multi-view indoor scene datasets [49, 72], our work achieves state-of-the-art 3D indoor scene generation performance, both quantitatively and qualitatively.
Researcher Affiliation Academia 1 Shanghai Jiao Tong University 2 University of Illinois Urbana-Champaign
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No We promise that we will open-source the data and code after paper acceptance.
Open Datasets Yes We use multi-view images from Scan Net++ [72] and Hyper Sim [49] to construct BBI data.Our processed data are publicly available 1 2. 1Layout Scannet++: https://huggingface.co/datasets/gzzyyxy/layout_diffusion_scannetpp_voxel0.2 2Layout Hypersim: https://huggingface.co/datasets/gzzyyxy/layout_diffusion_hypersim
Dataset Splits No The paper mentions splitting generation tasks but does not provide specific percentages or counts for train/validation/test splits of the datasets (ScanNet++ and Hypersim).
Hardware Specification Yes For finetuning the diffusion model, we use a total batch size of 16 on 2 NVIDIA A6000 GPUs with a constant learning rate of 5e-5, training for around 10k iterations. For the scene generation task, we use 2 A6000 GPUs to perform all our experiments.
Software Dependencies No The paper mentions software like Stable Diffusion, NeRFStudio, and Control Nets but does not provide specific version numbers for these or other dependencies.
Experiment Setup Yes For finetuning the diffusion model, we use a total batch size of 16 on 2 NVIDIA A6000 GPUs with a constant learning rate of 5e-5, training for around 10k iterations. For Ne RF training, we use a constant learning rate of 1e-2 for proposal networks and 1e-3 for fields.