OctOcc: High-Resolution 3D Occupancy Prediction with Octree

Authors: Wenzhe Ouyang, Xiaolin Song, Bailan Feng, Zenglin Xu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that the proposed Oct Occ significantly outperforms existing methods on nu Scenes and Semantic KITTI datasets with limited memory consumption.
Researcher Affiliation Collaboration Wenzhe Ouyang1*, Xiaolin Song2 , Bailan Feng2, Zenglin Xu1,3 1Harbin Institute of Technology, Shenzhen, Guandong, China 2Huawei Noah s Ark Lab, Beijing, China 3Peng Cheng Lab, Shenzhen, Guandong, China
Pseudocode No The paper includes a network diagram (Figure 2) but does not provide pseudocode or a clearly labeled algorithm block in text format.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets Yes To validate the effectiveness of the proposed method, we conduct comprehensive experiments on vision-based benchmarks, the surrounding view dataset nu Scenes (Caesar et al. 2020) and monocular view Semantic KITTI (Behley et al. 2019).
Dataset Splits Yes We evaluate our model on the validation set. For 3D semantic occupancy prediction, we use mean Intersection over Union (m Io U) to evaluate the performance of a model.
Hardware Specification No The paper mentions memory consumption ('less than 7GB', 'Over 32G') but does not specify any particular GPU models, CPU types, or other hardware components used for running experiments.
Software Dependencies No The paper mentions models and frameworks used (e.g., 'ResNet101-DCN', 'FCOS3D', 'FPN') but does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA 1x.x).
Experiment Setup Yes The whole network architecture is set to 4 levels, and the resolution of 3D voxel queries is set to 25 25 2, 50 50 4, 100 100 8, 200 200 16. The first three layers are supervised with binary 3D occupancy labels, and the last layer is supervised with 3D semantic occupancy labels. The values of top k between four 3D voxel features layers are set to 625, 3000, and 15000, respectively.