reproducibilityindex.ai

Cross-Modal Match for Language Conditioned 3D Object Grounding

Authors: Yachao Zhang, Runze Hu, Ronghui Li, Yanyun Qu, Yuan Xie, Xiu Li

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method in mainstream evaluation settings on three datasets, and the results demonstrate the effectiveness of the proposed method.
Researcher Affiliation	Academia	1Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China 2 School of Information and Electronics, Beijing Institute of Technology, Beijing, 100081, China 3School of Informatics, Xiamen University, Xiamen, 361000, China 4School of Computer Science and Technology, East China Normal University, Shanghai, 200062, China
Pseudocode	No	The paper does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	We leverage three recently released datasets, i.e., Nr3D (Achlioptas et al. 2020), Sr3D (Achlioptas et al. 2020) and Scan Refer (Chen, Chang, and Nießner 2020) built on the 3D scenes of Scan Net (Dai et al. 2017) to evaluate performance. We follow the official split for training and validation.
Dataset Splits	Yes	We follow the official split for training and validation. Additional split validation subsets. For Nr3D and Sr3D datasets, two splits during evaluation are introduced. 1) According to the number of distractors (more distractors indicate more difficulty), the sentences are split into an easy subset (less than or equal to 2 distractors) and a hard subset (more than 2 distractors) in evaluation. 2) According to whether the sentence requires a specific viewpoint to ground the referred object, the dataset can also be partitioned into view-dependent and view-independent subsets.
Hardware Specification	Yes	It is trained and evaluated on one NVIDIA RTX 3090 GPU with 24GB RAM.
Software Dependencies	Yes	We implement our model by using Py Torch based on Python 3.8.
Experiment Setup	Yes	We set batch size as 128, and learning rate as 0.0005 with a warm-up of 5, 000 iterations and cosine decay scheduling. Our model is trained 100 epochs using Adam optimizer. We directly set αa = 1 and αb = 1. We set the grid w of BEV as 0.5m.