Simple and Effective Synthesis of Indoor 3D Scenes
Authors: Jing Yu Koh, Harsh Agrawal, Dhruv Batra, Richard Tucker, Austin Waters, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the Matterport3D and Real Estate10K datasets, our approach significantly outperforms prior work when evaluated by humans, as well as on FID scores. |
| Researcher Affiliation | Collaboration | 1 Google Research 2 Georgia Institute of Technology 3 University of Michigan 4 Apple |
| Pseudocode | No | The paper does not include pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Our code is publicly released to facilitate generative data augmentation and applications to downstream robotics and embodied AI tasks. https://github.com/google-research/se3ds |
| Open Datasets | Yes | We conduct experiments on two datasets of diverse indoor environments: Matterport3D (Chang et al. 2017), which contains 3D meshes of 90 buildings reconstructed from 11K high-resolution RGB-D panoramas (panos), and Real Estate10K (Zhou et al. 2018b), a collection of up to 10,000 You Tube video walkthroughs of real estate properties. |
| Dataset Splits | Yes | Evaluations are based on Val-Seen and Val-Unseen splits of the Room-to-Room (R2R) dataset (Anderson et al. 2018b), which are comprised of sequences of adjacent panoramas ( 2.2m apart). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like ResNet-101, MiDaS, and VLN BERT agent, but does not specify their version numbers or other software dependencies with versions. |
| Experiment Setup | Yes | In all experiments we train the model end-to-end from scratch, and we randomly mask up to 75% of the input guidance image for data augmentation. ... train and evaluate with an image resolution of 256 256. |