Text-Guided Graph Neural Networks for Referring 3D Instance Segmentation
Authors: Pin-Hao Huang, Han-Hung Lee, Hwann-Tzong Chen, Tyng-Luh Liu1610-1618
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method achieves state-of-the-art performance on referring 3D instance segmentation and 3D localization on Scan Refer, Nr3D, and Sr3D benchmarks, respectively. |
| Researcher Affiliation | Collaboration | 1 Institute of Information Science, Academia Sinica, Taiwan 2 Department of Computer Science, National Tsing Hua University, Taiwan 3 Taiwan AI Labs 4 Aeolus Robotics |
| Pseudocode | Yes | Algorithm 1 Sequential Re-sampling for Instance Masks |
| Open Source Code | No | The paper does not explicitly state that source code for their method is provided or publicly available. |
| Open Datasets | Yes | We evaluate our method using recent 3D referring datasets including Scan Refer (Chen, Chang, and Nießner 2020) and Nr3D/Sr3D of Refer It3D (Achlioptas et al. 2020). The datasets are based on Scan Netv2 (Dai et al. 2017), which contains 1,513 richly-annotated 3D reconstructions of indoor scenes. |
| Dataset Splits | Yes | These datasets all follow the official Scan Net splits. |
| Hardware Specification | No | The paper mentions 'pre-train a sparse 3D UNet feature extractor' but does not specify any hardware details like CPU, GPU models, or memory. |
| Software Dependencies | No | The paper mentions various models and networks (GloVE, GRU, BERT, MLP, Sparse 3D UNet) but does not provide specific version numbers for any software or libraries. |
| Experiment Setup | Yes | For the experiments using GRU as the language extractor, we use a batch size of 8 and an initial learning rate of 0.001 with decay of 0.1 every 100 epochs. The maximum timestep and sentence length for GRU are set to 80. For the experiments with BERT (Vaswani et al. 2017; Devlin et al. 2018), the weights of the BERT model and TGNN are updated separately. The initial learning rate is 0.0002 for BERT with decay of 0.5 every 10 epochs, while the initial learning rate is 0.001 for TGNN with decay of 0.5 every 50 epochs. The batch size is 16, and the maximum sentence length is 80 as in GRU. The number of nearest-neighbors is 16 unless specified. The number of layers in the GNN is set to 3. |