Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention

Authors: Haomeng Zhang, Chiao-An Yang, Raymond A. Yeh

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, experiments show that our method outperforms the state-of-the-art methods on multi-object 3D grounding by 12.8% (absolute) and is competitive in single-object 3D grounding.
Researcher Affiliation Academia Department of Computer Science, Purdue University
Pseudocode No The paper includes architectural diagrams (Figure 1 and Figure 2) but does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code: https://github.com/haomengz/D-LISA
Open Datasets Yes We conduct experiments on the Multi3DRefer [52] dataset. We also compare our model with other two-stage methods on single-object grounding using the Scan Refer [8] and the Nr3D [2] datasets.
Dataset Splits Yes We follow the same train/val set split as the baselines [52].
Hardware Specification Yes We train our model on a single NVIDIA A100 GPU.
Software Dependencies No The paper mentions software components like 'Adam W optimizer' and 'CLIP with Vi T-B/32' but does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes We set the batch size to 4 with the Adam W optimizer using a learning rate of 5e 4. We set the dynamic proposal loss coefficient αdyn to 5. We set the τtrain to 0.25 and search for the optimal value of τpred over {0.05, 0.1, 0.15, 0.2, 0.25} during evaluation for M3DRef-CLIP w/NMS and our model.