ScanERU: Interactive 3D Visual Grounding Based on Embodied Reference Understanding

Authors: Ziyang Lu, Yunqiang Pei, Guoqing Wang, Peiwei Li, Yang Yang, Yinjie Lei, Heng Tao Shen

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate the superiority of the proposed method, especially in the recognition of multiple identical objects.
Researcher Affiliation Academia 1University of Electronic Science and Technology of China 2University of Science and Technology of China 3Sichuan University
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Our codes and dataset are available in the Scan ERU repository. Our project page: https://github.com/MrLearnedToad/ScanERU
Open Datasets Yes Our codes and dataset are available in the Scan ERU repository. Our project page: https://github.com/MrLearnedToad/ScanERU. ...Our dataset is based on the Scan Refer (Chen, Chang, and Nießner 2020) and Scan Net (Dai et al. 2017) datasets and includes 706 unique indoor scenes, 9,929 referred objects, and 46,173 descriptions.
Dataset Splits Yes In our experimental evaluation, we conduct tests on the Scan ERU dataset. Following the same protocol as the Scan Refer dataset, we split it into train and validation sets with 36,665 and 9,508 samples, respectively.
Hardware Specification No The paper does not explicitly describe the hardware (e.g., specific CPU/GPU models) used to run the experiments. It only mentions the Azure Kinect DK camera for data acquisition for the test set.
Software Dependencies No The paper mentions 'Open3D (Zhou, Park, and Koltun 2018)' and 'Blender script', but it does not provide specific version numbers for these or other software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow) or their versions.
Experiment Setup No The paper describes components of the experimental setup, such as dataset splits and loss function structure, and references previous work for some modules. However, it does not explicitly provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) or detailed training configurations.