OctOcc: High-Resolution 3D Occupancy Prediction with Octree
Authors: Wenzhe Ouyang, Xiaolin Song, Bailan Feng, Zenglin Xu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that the proposed Oct Occ significantly outperforms existing methods on nu Scenes and Semantic KITTI datasets with limited memory consumption. |
| Researcher Affiliation | Collaboration | Wenzhe Ouyang1*, Xiaolin Song2 , Bailan Feng2, Zenglin Xu1,3 1Harbin Institute of Technology, Shenzhen, Guandong, China 2Huawei Noah s Ark Lab, Beijing, China 3Peng Cheng Lab, Shenzhen, Guandong, China |
| Pseudocode | No | The paper includes a network diagram (Figure 2) but does not provide pseudocode or a clearly labeled algorithm block in text format. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | To validate the effectiveness of the proposed method, we conduct comprehensive experiments on vision-based benchmarks, the surrounding view dataset nu Scenes (Caesar et al. 2020) and monocular view Semantic KITTI (Behley et al. 2019). |
| Dataset Splits | Yes | We evaluate our model on the validation set. For 3D semantic occupancy prediction, we use mean Intersection over Union (m Io U) to evaluate the performance of a model. |
| Hardware Specification | No | The paper mentions memory consumption ('less than 7GB', 'Over 32G') but does not specify any particular GPU models, CPU types, or other hardware components used for running experiments. |
| Software Dependencies | No | The paper mentions models and frameworks used (e.g., 'ResNet101-DCN', 'FCOS3D', 'FPN') but does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, CUDA 1x.x). |
| Experiment Setup | Yes | The whole network architecture is set to 4 levels, and the resolution of 3D voxel queries is set to 25 25 2, 50 50 4, 100 100 8, 200 200 16. The first three layers are supervised with binary 3D occupancy labels, and the last layer is supervised with 3D semantic occupancy labels. The values of top k between four 3D voxel features layers are set to 625, 3000, and 15000, respectively. |