OctreeOcc: Efficient and Multi-Granularity Occupancy Prediction Using Octree Queries
Authors: Yuhang Lu, Xinge ZHU, Tai WANG, Yuexin Ma
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive evaluations show that Octree Occ not only surpasses state-of-the-art methods in occupancy prediction, but also achieves a 15% 24% reduction in computational overhead compared to dense-grid-based methods. ... Our extensive evaluations against state-of-the-art occupancy prediction methods show that Octree Occ outperforms others on nu Scenes and Semantic KITTI datasets, reducing computational overhead by 15% 24% for dense-grid-based methods. Ablation studies further validate the effectiveness of each module within our method. |
| Researcher Affiliation | Academia | Yuhang Lu Shanghai Tech University luyh2@shanghaitech.edu.cn Xinge Zhu The Chinese University of Hong Kong zhuxinge123@gmail.com Tai Wang Shanghai AI Laboratory taiwang.me@gmail.com Yuexin Ma Shanghai Tech University mayuexin@shanghaitech.edu.cn |
| Pseudocode | No | No pseudocode or algorithm blocks explicitly labeled as 'Algorithm' or 'Pseudocode' were found in the paper. |
| Open Source Code | No | We will make the code publicly available upon acceptance of the paper to advance the field. |
| Open Datasets | Yes | Occ3D-nu Scenes(23) re-annotates the nu Scenes dataset(49) with precise occupancy labels derived from Li DAR scans and human annotations. ... Semantic KITTI(50) comprises 22 distinct outdoor driving scenarios... |
| Dataset Splits | Yes | Occ3D-nu Scenes... It includes 700 training instances and 150 validation instances... We compare our results with those of other SSC methods on the Semantic KITTI validation set. ... All the experiments are conducted on the Occ3d-nus validation set... |
| Hardware Specification | Yes | The model is trained for 24 epochs, consuming around 3 days on 8 NVIDIA A100 GPUs. ... All the experiments are conducted on the NVIDIA A40 GPU with reducing the input image size to 0.3x. |
| Software Dependencies | No | The paper mentions using 'Adam(53) optimizer', 'Res Net101DCN(51)', 'Feature Pyramid Network(52)', 'UNet(47)', and 'deformable attention(48)'. However, it does not specify version numbers for any software libraries, frameworks, or programming languages. |
| Experiment Setup | Yes | We set the input image size to 900 1600 and employ Res Net101DCN(51) as the image backbone. Multi-scale features are extracted from the Feature Pyramid Network(52) with downsampling sizes of 1/8, 1/16, 1/32, and 1/64. The feature dimension C is set to 256. The octree depth is 3, and the initial query resolution is 50 50 4. We choose query selection ratios of 20% and 60% for the two divisions. The octree encoder comprises three layers, each composed of TSA, ICA, and Iterative Structure Rectification (ISR) modules. Both M1 and M2 are set to 4. In TSA, we fuse four temporal frames. In ISR, the top 10% predictions are considered high-confidence in level 1, and 30% in level 2. The loss weights are uniformly set to 1.0. For optimization, we employ Adam(53) optimizer with a learning rate of 2e-4 and weight decay of 0.01. The batch size is 8, and the model is trained for 24 epochs |