InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior

Authors: Chenguo Lin, Yadong MU

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results reveal that the proposed method surpasses existing state-of-the-art approaches by a large margin. Thorough ablation studies confirm the efficacy of crucial design components.
Researcher Affiliation Academia Chenguo Lin, Yadong Mu Peking University chenguolin@stu.pku.edu.cn, myd@pku.edu.cn
Pseudocode No No explicit pseudocode or algorithm blocks found.
Open Source Code Yes Project page: https://chenguolin.github.io/projects/Instruct Scene. Our instruction-scene pair dataset and code for both training and evaluation can be found in https://chenguolin.github.io/projects/Instruct Scene.
Open Datasets Yes To fit practical scenarios and promote the benchmarking of instruction-drive scene synthesis, we curate a high-quality dataset containing paired scenes and instructions with the help of large language and multimodal models (Li et al., 2022; Ouyang et al., 2022; Open AI, 2023). Our instruction-scene pair dataset and code for both training and evaluation can be found in https://chenguolin.github.io/projects/Instruct Scene.
Dataset Splits Yes We use the same data split for training and evaluation as ATISS (Paschalidou et al., 2021).
Hardware Specification Yes our method takes about 12 seconds to generate a batch of 128 living rooms by our method on a single A40 GPU.
Software Dependencies No The paper mentions software like Open Shape, CLIP, Blender, and clean-fid library, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We use 5-layer and 8-head Transformers with 512 attention dimensions and a dropout rate of 0.1 for all generative models in this work. They are trained by the Adam W optimizer (Loshchilov & Hutter, 2018) for 500,000 iterations with a batch size of 128, a learning rate of 1e-4, and a weight decay of 0.02. Exponentially moving average (EMA) technique (Polyak & Juditsky, 1992; Ho et al., 2020) with a decay factor of 0.9999 is utilized in the model parameters.