OctAttention: Octree-Based Large-Scale Contexts Model for Point Cloud Compression
Authors: Chunyang Fu, Ge Li, Rui Song, Wei Gao, Shan Liu625-633
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Compared to the previous state-of-the-art works, our approach obtains a 10%-35% BD-Rate gain on the Li DAR benchmark (e.g. Semantic KITTI) and object point cloud dataset (e.g. MPEG 8i, MVUB), and saves 95% coding time compared to the voxel-based baseline. The experiments show that our method outperforms these state-of-the-art methods, which are only designed for a specific category of point clouds. We perform an ablation experiment on Semantic KITTI to demonstrate the effectiveness of a large receptive field context. |
| Researcher Affiliation | Collaboration | Chunyang Fu1,2, Ge Li1, Rui Song1, Wei Gao1,2*, Shan Liu3 1School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School 2Peng Cheng Laboratory 3Tencent America |
| Pseudocode | No | The paper describes the proposed method in prose and equations, but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/zb12138/Oct Attention. |
| Open Datasets | Yes | Li DAR Dataset Semantic KITTI (Behley et al. 2019) is a large sparse Li DAR dataset for self-driving. Object Point Cloud Dataset Microsoft Voxelized Upper Bodies (MVUB) (Charles et al. 2016) is a dynamic voxelized point cloud dataset. 8i Voxelized Full Bodies (MPEG 8i) (Eugene et al. 2017) includes sequences of smooth surface and complete human shape point clouds. |
| Dataset Splits | No | The paper specifies training and testing splits for the Semantic KITTI dataset ('sequences 00 to 10... for training, and sequences 11 to 21... for testing'), but does not mention a separate validation split or explicit details for one. |
| Hardware Specification | Yes | We implement our model in Py Torch and perform the training/testing with Xeon E5-2637 CPU and one NVIDIA TITAN Xp GPU (12G memory). |
| Software Dependencies | No | The paper states 'We implement our model in Py Torch' but does not specify any version numbers for PyTorch or any other software dependencies used in the experiments. |
| Experiment Setup | Yes | We use batch sizes of 32, epochs of 8 and Adam optimizer with a learning rate of 1e 3. Occupancy, level index, and octant index are embedded into 128, 6, and 4 dimensions, respectively. We set K = 4, N = N0 = 1024 and use 2 layers and 4 heads in multi-head self-attention in experiments unless otherwise specified. |