Mono3DVG: 3D Visual Grounding in Monocular Images

Authors: Yang Zhan, Yuan Yuan, Zhitong Xiong

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive benchmarks and some insightful analyses are provided for Mono3DVG. Extensive comparisons and ablation studies show that our method significantly outperforms all baselines.
Researcher Affiliation Academia 1School of Artificial Intelligence, Optics and Electronics (i OPEN), Northwestern Polytechnical University, Xi an, China 2Technical University of Munich (TUM), Munich, Germany
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The dataset and code will be released.
Open Datasets Yes To facilitate the broad application of 3D visual grounding, we employ both manually annotated and Chat GPT to annotate a large-scale dataset based on KITTI (Geiger, Lenz, and Urtasun 2012) for Mono3DVG.
Dataset Splits Yes We split our dataset into 29,990, 5,735, and 5,415 expressions for train/val/test sets respectively.
Hardware Specification Yes We train 60 epochs with a batch size of 10 by Adam W with 10 4 learning rate and 10 4 weight decay on one GTX 3090 24-Gi B GPU.
Software Dependencies No The paper mentions 'Adam W' as an optimizer but does not provide specific version numbers for any software dependencies like programming languages, libraries, or frameworks.
Experiment Setup Yes We train 60 epochs with a batch size of 10 by Adam W with 10 4 learning rate and 10 4 weight decay on one GTX 3090 24-Gi B GPU. The learning rate decays by a factor of 10 after 40 epochs. The dropout ratio is set to 0.1.