AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection

Authors: Zehui Chen, Zhenyu Li, Shiquan Zhang, Liangji Fang, Qinhong Jiang, Feng Zhao, Bolei Zhou, Hang Zhao

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results show that our approach can lead to 2.3 m AP and 7.0 m AP improvements on the KITTI and nu Scenes datasets, respectively. Notably, our best model reaches 70.9 NDS on the nu Scenes testing leaderboard, achieving competitive performance among various state-of-the-arts. 4 Experiments 4.1 Implementation Details 4.2 Results on KITTI Dataset 4.3 Results on Nu Scenes Dataset 4.4 Ablation Studies
Researcher Affiliation Collaboration 1 University of Science and Technology of China 2 Harbin Institute of Technology 3 Sense Time Research 4 The Chinese University of Hong Kong 5 Tsinghua University
Pseudocode No The paper contains architectural diagrams and describes the method in text, but no structured pseudocode or algorithm blocks are present.
Open Source Code No The paper states, 'We use MMDetection3D [Contributors, 2020] as our codebase, and apply the default settings if not specified.', which refers to a third-party open-source codebase they utilized, not their specific implementation of Auto Align. There is no explicit statement or link indicating the release of their own source code for the proposed method.
Open Datasets Yes We evaluate our framework on the KITTI dataset and report the average precision (AP40). We also conduct experiments on the much larger nu Scenes dataset with current state-of-the-art 3D detector Center Point to further validate the effectiveness of Auto Align.
Dataset Splits Yes The models are trained on nu Scenes training subset and evaluated on nu Scenes validation subset. Results are reported on KITTI validation set with SECOND.
Hardware Specification No The paper mentions 'We acknowledge the support of GPU cluster built by MCC Lab of Information Science and Technology Institution, USTC.' but does not specify exact GPU models, CPU, or other detailed hardware specifications.
Software Dependencies No The paper states 'We use MMDetection3D [Contributors, 2020] as our codebase' and mentions optimizers like Adam W and SGD, but does not provide specific version numbers for these software components or programming languages (e.g., Python, PyTorch, CUDA).
Experiment Setup Yes The hidden units of cross-attention alignment module are set to 128 and the output sizes of 2DRo IAlign and 3DRo IPooling are both set to 4. The MLP units of the projector and predictor of the self-supervised cross-modal module are 2048 and the hidden unit number is 512. Our 2D-3D joint training framework is optimized in an end-to-end manner with hybrid optimizers, where the 3D branch is optimized with Adam W and the 2D branch is optimized with SGD.