Multimodal Virtual Point 3D Detection

Authors: Tianwei Yin, Xingyi Zhou, Philipp Krähenbühl

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the large-scale nu Scenes dataset show that our framework improves a strong Center Point baseline by a significant 6.6 m AP, and outperforms competing fusion approaches.
Researcher Affiliation Academia Tianwei Yin UT Austin yintianwei@utexas.edu Xingyi Zhou UT Austin zhouxy@cs.utexas.edu Philipp Krähenbühl UT Austin philkr@cs.utexas.edu
Pseudocode Yes Algorithm 1: Multi-modal Virtual Point Generation
Open Source Code Yes Code and more visualizations are available at https://tianweiy.github.io/mvp/.
Open Datasets Yes We test our model on the large-scale nu Scenes dataset [2].
Dataset Splits Yes We follow the official dataset split to use 700, 150, 150 sequences for training, validation, and testing.
Hardware Specification Yes The training takes 2.5 days on 4 V100 GPUs with a batch size of 16 (4 frames per GPU).
Software Dependencies No The paper mentions software used ('Center Point', 'Center Net2') but does not specify version numbers for these or underlying software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes We train the detector on the nu Scenes dataset using the SGD optimizer with a batch size of 16 and a learning rate of 0.02 for 90000 iterations. ... We train the model for 20 epochs with the Adam W [34] optimizer using the one-cycle policy [16], with a max learning rate of 3e-3 following [66].