OctreeOcc: Efficient and Multi-Granularity Occupancy Prediction Using Octree Queries

Authors: Yuhang Lu, Xinge ZHU, Tai WANG, Yuexin Ma

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive evaluations show that Octree Occ not only surpasses state-of-the-art methods in occupancy prediction, but also achieves a 15% 24% reduction in computational overhead compared to dense-grid-based methods. ... Our extensive evaluations against state-of-the-art occupancy prediction methods show that Octree Occ outperforms others on nu Scenes and Semantic KITTI datasets, reducing computational overhead by 15% 24% for dense-grid-based methods. Ablation studies further validate the effectiveness of each module within our method.
Researcher Affiliation Academia Yuhang Lu Shanghai Tech University luyh2@shanghaitech.edu.cn Xinge Zhu The Chinese University of Hong Kong zhuxinge123@gmail.com Tai Wang Shanghai AI Laboratory taiwang.me@gmail.com Yuexin Ma Shanghai Tech University mayuexin@shanghaitech.edu.cn
Pseudocode No No pseudocode or algorithm blocks explicitly labeled as 'Algorithm' or 'Pseudocode' were found in the paper.
Open Source Code No We will make the code publicly available upon acceptance of the paper to advance the field.
Open Datasets Yes Occ3D-nu Scenes(23) re-annotates the nu Scenes dataset(49) with precise occupancy labels derived from Li DAR scans and human annotations. ... Semantic KITTI(50) comprises 22 distinct outdoor driving scenarios...
Dataset Splits Yes Occ3D-nu Scenes... It includes 700 training instances and 150 validation instances... We compare our results with those of other SSC methods on the Semantic KITTI validation set. ... All the experiments are conducted on the Occ3d-nus validation set...
Hardware Specification Yes The model is trained for 24 epochs, consuming around 3 days on 8 NVIDIA A100 GPUs. ... All the experiments are conducted on the NVIDIA A40 GPU with reducing the input image size to 0.3x.
Software Dependencies No The paper mentions using 'Adam(53) optimizer', 'Res Net101DCN(51)', 'Feature Pyramid Network(52)', 'UNet(47)', and 'deformable attention(48)'. However, it does not specify version numbers for any software libraries, frameworks, or programming languages.
Experiment Setup Yes We set the input image size to 900 1600 and employ Res Net101DCN(51) as the image backbone. Multi-scale features are extracted from the Feature Pyramid Network(52) with downsampling sizes of 1/8, 1/16, 1/32, and 1/64. The feature dimension C is set to 256. The octree depth is 3, and the initial query resolution is 50 50 4. We choose query selection ratios of 20% and 60% for the two divisions. The octree encoder comprises three layers, each composed of TSA, ICA, and Iterative Structure Rectification (ISR) modules. Both M1 and M2 are set to 4. In TSA, we fuse four temporal frames. In ISR, the top 10% predictions are considered high-confidence in level 1, and 30% in level 2. The loss weights are uniformly set to 1.0. For optimization, we employ Adam(53) optimizer with a learning rate of 2e-4 and weight decay of 0.01. The batch size is 8, and the model is trained for 24 epochs