Multi-Object 3D Grounding with Dynamic Modules and Language-Informed Spatial Attention
Authors: Haomeng Zhang, Chiao-An Yang, Raymond A. Yeh
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, experiments show that our method outperforms the state-of-the-art methods on multi-object 3D grounding by 12.8% (absolute) and is competitive in single-object 3D grounding. |
| Researcher Affiliation | Academia | Department of Computer Science, Purdue University |
| Pseudocode | No | The paper includes architectural diagrams (Figure 1 and Figure 2) but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code: https://github.com/haomengz/D-LISA |
| Open Datasets | Yes | We conduct experiments on the Multi3DRefer [52] dataset. We also compare our model with other two-stage methods on single-object grounding using the Scan Refer [8] and the Nr3D [2] datasets. |
| Dataset Splits | Yes | We follow the same train/val set split as the baselines [52]. |
| Hardware Specification | Yes | We train our model on a single NVIDIA A100 GPU. |
| Software Dependencies | No | The paper mentions software components like 'Adam W optimizer' and 'CLIP with Vi T-B/32' but does not provide specific version numbers for these or other key software dependencies (e.g., Python, PyTorch/TensorFlow versions). |
| Experiment Setup | Yes | We set the batch size to 4 with the Adam W optimizer using a learning rate of 5e 4. We set the dynamic proposal loss coefficient αdyn to 5. We set the τtrain to 0.25 and search for the optimal value of τpred over {0.05, 0.1, 0.15, 0.2, 0.25} during evaluation for M3DRef-CLIP w/NMS and our model. |