Towards Flexible 3D Perception: Object-Centric Occupancy Completion Augments 3D Object Detection
Authors: Chaoda Zheng, Feng Wang, Naiyan Wang, Shuguang Cui, Zhen Li
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our method is evaluated on the Waymo Open Dataset (WOD)[27]. We use the official training set, comprising 798 sequences for training, and 202 sequences for evaluation. We apply our automatic pipeline on WOD to construct the object-centric occupancy annotations with the voxel size set to 0.2m. All experiments are conducted on rigid objects (i.e., vehicles) to ensure accurate evaluation of shape completion using our annotated ground-truths. and Tab. 2 presents the 3D detection results on the WOD val set. |
| Researcher Affiliation | Collaboration | Chaoda Zheng1,2 Feng Wang3 Naiyan Wang4 Shuguang Cui2,1 Zhen Li2,1 1FNii-Shenzhen 2SSE, CUHK-Shenzhen 3Tu Simple 4Xiaomi EV |
| Pseudocode | No | The paper includes architecture diagrams (Figure 4, 8, 9) and equations, but no explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | https://github.com/Ghostish/Object Centric Occ Completion |
| Open Datasets | Yes | Our method is evaluated on the Waymo Open Dataset (WOD)[27]. |
| Dataset Splits | Yes | We use the official training set, comprising 798 sequences for training, and 202 sequences for evaluation. |
| Hardware Specification | Yes | The model is implemented using Py Torch and trained on 8 NVIDIA 3090 GPUs. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify its version number or other software dependencies with specific versions. |
| Experiment Setup | Yes | During training, we randomly sample 1024 voxel centers and corresponding occupancy statuses from each annotated occupancy as the position queries. To ensure the occupancy prediction is not biased, we adopt a balanced sampling strategy, where 512 points are sampled from the occupied voxels and 512 from the free voxels. and We train our model using the Adam optimizer with an initial learning rate of 1e-4 and a batch size of 8. The model is trained for 24 epochs with the learning rate scheduled by the cosine annealing strategy. and We use a transformer with 3 layers, 4 heads, and a hidden dimension of 512. |