reproducibilityindex.ai

Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding

Authors: Taolin Zhang, Sunan He, Tao Dai, Zhi Wang, Bin Chen, Shu-Tao Xia

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments verify the excellent performance of 3DVLP on three 3D vision-language tasks, reflecting its superiority in semantic 3D scene understanding.
Researcher Affiliation	Academia	1Tsinghua Shenzhen International Graduate School, Tsinghua University 2Department of Computer Science and Engineering , Hong Kong University of Science and Technology 3College of Computer Science and Software Engineering, Shenzhen University 4Harbin Institute of Technology, Shenzhen 5Research Center of Artifcial Intelligence, Peng Cheng Laboratory
Pseudocode	No	The paper describes the methodology in prose and uses diagrams to illustrate concepts but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/iridescentttt/3DVLP.
Open Datasets	Yes	Visual Grounding Dataset: We select the benchmark dataset Scan Refer (Chen, Chang, and Nießner 2020) for visual grounding task. It consists of 800 3D scenes from the Scan Net dataset (Dai et al. 2017)
Dataset Splits	No	The paper mentions benchmark datasets like Scan Refer, ScanNet, Scan2Cap, and Scan QA, but does not explicitly detail the training, validation, or test splits used for these datasets.
Hardware Specification	Yes	Codes are implemented by Pytorch and run on a Nvidia 3090 GPU.
Software Dependencies	No	Codes are implemented by Pytorch and run on a Nvidia 3090 GPU. (No version specified for Pytorch or other dependencies).
Experiment Setup	Yes	We first train 3DVLP over the proposed proxy tasks including visual grounding, OCC and OSC in the pre-training stage for 200 epochs. We set the batch size as 8 and the initial learning rate is set to be 0.002 for the detector and 5e-4 for other modules in the 3DVLP.