SQA3D: Situated Question Answering in 3D Scenes
Authors: Xiaojian Ma, Silong Yong, Zilong Zheng, Qing Li, Yitao Liang, Song-Chun Zhu, Siyuan Huang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate various state-of-the-art approaches and find that the best one only achieves an overall score of 47.20%, while amateur human participants can reach 90.06%. We believe SQA3D could facilitate future embodied AI research with stronger situation understanding and reasoning capabilities. Code and data are released at sqa3d.github.io. |
| Researcher Affiliation | Collaboration | 1Beijing Institute for General Artificial Intelligence (BIGAI) 2UCLA 3Tsinghua University 4Peking University |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and data are released at sqa3d.github.io. |
| Open Datasets | Yes | Based upon 650 scenes from Scan Net, we provide a dataset centered around 6.8k unique situations, along with 20.4k descriptions and 33.4k diverse reasoning questions for these situations... Code and data are released at sqa3d.github.io. |
| Dataset Splits | Yes | We follow the practice of Scan Net and split SQA3D into train, val, and test sets... The statistics of these splits can be found in Table 2... Table 2: Total stxt (train/val/test) 16,229/1,997/2,143 Total q (train/val/test) 26,623/3,261/3,519 Unique q (train/val/test) 20,183/2,872/3,036 Total scenes (train/val/test) 518/65/67 Total objects (train/val/test) 11,723/1,550/1,652 |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments (e.g., specific GPU models, CPU models, or cloud instance types). |
| Software Dependencies | No | The paper mentions using specific models like 'Scan QA', 'Clip BERT', and 'MCAN', but it does not provide specific version numbers for the software dependencies (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | We adopt most of their default hyper-parameters and the details can be found in appendix... C.2 HYPER-PARAMETERS... Table 4: Hyper-parameters for the considered models. Parameter Value... Batch size 16 Total training epochs 50 Number of layers for transformer 2... Learning rate 5e-4 |