Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders
Authors: Xueying Jiang, Sheng Jin, Xiaoqin Zhang, Ling Shao, Shijian Lu
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments over KITTI 3D and nu Scenes show that Mono MAE outperforms the state-of-the-art consistently and it can generalize to new domains as well. |
| Researcher Affiliation | Academia | Xueying Jiang1, Sheng Jin1, Xiaoqin Zhang2, Ling Shao3, Shijian Lu1 1S-Lab, Nanyang Technological University, Singapore 2College of Computer Science and Technology, Zhejiang University of Technology, China 3UCAS-Terminus AI Lab, University of Chinese Academy of Sciences, China |
| Pseudocode | No | The paper describes the method and its components in detail but does not provide explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The used datasets are publicly available. We will consider releasing the code upon acceptance. |
| Open Datasets | Yes | KITTI 3D [13] comprises 7,481 training images and 7,518 testing images, with training-data labels publicly available and test-data labels stored on a test server for evaluation. Nu Scenes [3] comprises 1,000 video scenes |
| Dataset Splits | Yes | Following [7], we divide the 7,481 training samples into a new train set with 3,712 images and a validation set with 3,769 images for ablation studies. The dataset [Nu Scenes] is split into a training set (700 scenes), a validation set (150 scenes), and a test set (150 scenes). |
| Hardware Specification | Yes | We conduct experiments on one NVIDIA V100 GPU and train the framework for 200 epochs with a batch size of 16 and a learning rate of 2 10 4. |
| Software Dependencies | No | The paper mentions using AdamW, ResNet-50 as backbone, and a 3D detection head from [66], but does not provide specific version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | We conduct experiments on one NVIDIA V100 GPU and train the framework for 200 epochs with a batch size of 16 and a learning rate of 2 10 4. We use the Adam W [36] optimizer with weight decay 10 4. |