MonoDistill: Learning Spatial Features for Monocular 3D Object Detection
Authors: Zhiyu Chong, Xinzhu Ma, Hong Zhang, Yuxin Yue, Haojie Li, Zhihui Wang, Wanli Ouyang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that the proposed method can significantly boost the performance of the baseline model and ranks the 1st place among all monocular-based methods on the KITTI benchmark. Besides, extensive ablation studies are conducted, which further prove the effectiveness of each part of our designs and illustrate what the baseline model has learned from the Li DAR Net. |
| Researcher Affiliation | Academia | Zhiyu Chong 1, Xinzhu Ma 2, Hong Zhang1, Yuxin Yue1, Haojie Li1, Zhihui Wang1, and Wanli Ouyang2 1Dalian University of Technology 2The University of Sydney |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | Our code will be released at https://github.com/monster-ghost/Mono Distill. |
| Open Datasets | Yes | We conduct our experiments on the KITTI (Geiger et al., 2012), which is most commonly used dataset in 3D detection task. |
| Dataset Splits | Yes | Specifically, this dataset provides 7,481 training samples and 7,518 testing samples, and we further divide the training data into a train set (3,712 samples) and a validation set (3,769 samples), following prior works (Chen et al., 2015). |
| Hardware Specification | Yes | Our model is trained on 2 NVIDIA 1080Ti GPUs in an end-to-end manner for 150 epochs. |
| Software Dependencies | No | The paper states 'We implemented our method using Py Torch.' but does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | Our model is trained on 2 NVIDIA 1080Ti GPUs in an end-to-end manner for 150 epochs. We employ the common Adam optimizer with initial learning rate 1.25e 4, and decay it by ten times at 90 and 120 epochs. To stabilize the training process, we also applied the warm-up strategy (5 epochs). As for data augmentations, only random random flip and center crop are applied. |