InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior
Authors: Chenguo Lin, Yadong MU
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results reveal that the proposed method surpasses existing state-of-the-art approaches by a large margin. Thorough ablation studies confirm the efficacy of crucial design components. |
| Researcher Affiliation | Academia | Chenguo Lin, Yadong Mu Peking University chenguolin@stu.pku.edu.cn, myd@pku.edu.cn |
| Pseudocode | No | No explicit pseudocode or algorithm blocks found. |
| Open Source Code | Yes | Project page: https://chenguolin.github.io/projects/Instruct Scene. Our instruction-scene pair dataset and code for both training and evaluation can be found in https://chenguolin.github.io/projects/Instruct Scene. |
| Open Datasets | Yes | To fit practical scenarios and promote the benchmarking of instruction-drive scene synthesis, we curate a high-quality dataset containing paired scenes and instructions with the help of large language and multimodal models (Li et al., 2022; Ouyang et al., 2022; Open AI, 2023). Our instruction-scene pair dataset and code for both training and evaluation can be found in https://chenguolin.github.io/projects/Instruct Scene. |
| Dataset Splits | Yes | We use the same data split for training and evaluation as ATISS (Paschalidou et al., 2021). |
| Hardware Specification | Yes | our method takes about 12 seconds to generate a batch of 128 living rooms by our method on a single A40 GPU. |
| Software Dependencies | No | The paper mentions software like Open Shape, CLIP, Blender, and clean-fid library, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We use 5-layer and 8-head Transformers with 512 attention dimensions and a dropout rate of 0.1 for all generative models in this work. They are trained by the Adam W optimizer (Loshchilov & Hutter, 2018) for 500,000 iterations with a batch size of 128, a learning rate of 1e-4, and a weight decay of 0.02. Exponentially moving average (EMA) technique (Polyak & Juditsky, 1992; Ho et al., 2020) with a decay factor of 0.9999 is utilized in the model parameters. |