MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders
Authors: Xueying Jiang, Sheng Jin, Xiaoqin Zhang, Ling Shao, Shijian Lu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments over KITTI 3D and nu Scenes show that Mono MAE outperforms the state-of-the-art consistently and it can generalize to new domains as well. |
| Researcher Affiliation | Academia | Xueying Jiang1, Sheng Jin1, Xiaoqin Zhang2, Ling Shao3, Shijian Lu1 1S-Lab, Nanyang Technological University, Singapore 2College of Computer Science and Technology, Zhejiang University of Technology, China 3UCAS-Terminus AI Lab, University of Chinese Academy of Sciences, China |
| Pseudocode | No | The paper describes the method and its components in detail but does not provide explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The used datasets are publicly available. We will consider releasing the code upon acceptance. |
| Open Datasets | Yes | KITTI 3D [13] comprises 7,481 training images and 7,518 testing images, with training-data labels publicly available and test-data labels stored on a test server for evaluation. Nu Scenes [3] comprises 1,000 video scenes |
| Dataset Splits | Yes | Following [7], we divide the 7,481 training samples into a new train set with 3,712 images and a validation set with 3,769 images for ablation studies. The dataset [Nu Scenes] is split into a training set (700 scenes), a validation set (150 scenes), and a test set (150 scenes). |
| Hardware Specification | Yes | We conduct experiments on one NVIDIA V100 GPU and train the framework for 200 epochs with a batch size of 16 and a learning rate of 2 10 4. |
| Software Dependencies | No | The paper mentions using AdamW, ResNet-50 as backbone, and a 3D detection head from [66], but does not provide specific version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | We conduct experiments on one NVIDIA V100 GPU and train the framework for 200 epochs with a batch size of 16 and a learning rate of 2 10 4. We use the Adam W [36] optimizer with weight decay 10 4. |