Towards Flexible 3D Perception: Object-Centric Occupancy Completion Augments 3D Object Detection

Authors: Chaoda Zheng, Feng Wang, Naiyan Wang, Shuguang Cui, Zhen Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method is evaluated on the Waymo Open Dataset (WOD)[27]. We use the official training set, comprising 798 sequences for training, and 202 sequences for evaluation. We apply our automatic pipeline on WOD to construct the object-centric occupancy annotations with the voxel size set to 0.2m. All experiments are conducted on rigid objects (i.e., vehicles) to ensure accurate evaluation of shape completion using our annotated ground-truths. and Tab. 2 presents the 3D detection results on the WOD val set.
Researcher Affiliation Collaboration Chaoda Zheng1,2 Feng Wang3 Naiyan Wang4 Shuguang Cui2,1 Zhen Li2,1 1FNii-Shenzhen 2SSE, CUHK-Shenzhen 3Tu Simple 4Xiaomi EV
Pseudocode No The paper includes architecture diagrams (Figure 4, 8, 9) and equations, but no explicit pseudocode or algorithm blocks.
Open Source Code Yes https://github.com/Ghostish/Object Centric Occ Completion
Open Datasets Yes Our method is evaluated on the Waymo Open Dataset (WOD)[27].
Dataset Splits Yes We use the official training set, comprising 798 sequences for training, and 202 sequences for evaluation.
Hardware Specification Yes The model is implemented using Py Torch and trained on 8 NVIDIA 3090 GPUs.
Software Dependencies No The paper mentions 'Py Torch' but does not specify its version number or other software dependencies with specific versions.
Experiment Setup Yes During training, we randomly sample 1024 voxel centers and corresponding occupancy statuses from each annotated occupancy as the position queries. To ensure the occupancy prediction is not biased, we adopt a balanced sampling strategy, where 512 points are sampled from the occupied voxels and 512 from the free voxels. and We train our model using the Adam optimizer with an initial learning rate of 1e-4 and a batch size of 8. The model is trained for 24 epochs with the learning rate scheduled by the cosine annealing strategy. and We use a transformer with 3 layers, 4 heads, and a hidden dimension of 512.