Mono3DVG: 3D Visual Grounding in Monocular Images
Authors: Yang Zhan, Yuan Yuan, Zhitong Xiong
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive benchmarks and some insightful analyses are provided for Mono3DVG. Extensive comparisons and ablation studies show that our method significantly outperforms all baselines. |
| Researcher Affiliation | Academia | 1School of Artificial Intelligence, Optics and Electronics (i OPEN), Northwestern Polytechnical University, Xi an, China 2Technical University of Munich (TUM), Munich, Germany |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The dataset and code will be released. |
| Open Datasets | Yes | To facilitate the broad application of 3D visual grounding, we employ both manually annotated and Chat GPT to annotate a large-scale dataset based on KITTI (Geiger, Lenz, and Urtasun 2012) for Mono3DVG. |
| Dataset Splits | Yes | We split our dataset into 29,990, 5,735, and 5,415 expressions for train/val/test sets respectively. |
| Hardware Specification | Yes | We train 60 epochs with a batch size of 10 by Adam W with 10 4 learning rate and 10 4 weight decay on one GTX 3090 24-Gi B GPU. |
| Software Dependencies | No | The paper mentions 'Adam W' as an optimizer but does not provide specific version numbers for any software dependencies like programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | We train 60 epochs with a batch size of 10 by Adam W with 10 4 learning rate and 10 4 weight decay on one GTX 3090 24-Gi B GPU. The learning rate decays by a factor of 10 after 40 epochs. The dropout ratio is set to 0.1. |