Context and Geometry Aware Voxel Transformer for Semantic Scene Completion
Authors: Zhu Yu, Runmin Zhang, Jiacheng Ying, Junchen Yu, Xiaohai Hu, Lun Luo, Si-Yuan Cao, Hui-liang Shen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that CGFormer achieves state-of-the-art performance on the Semantic KITTI and SSCBench-KITTI-360 benchmarks, attaining a m Io U of 16.87 and 20.05, as well as an Io U of 45.99 and 48.07, respectively. |
| Researcher Affiliation | Collaboration | Zhu Yu1 Runmin Zhang1 Jiacheng Ying1 Junchen Yu1 Xiaohai Hu3 Lun Luo4 Si-Yuan Cao2,1 Hui-Liang Shen1 1Zhejiang University 2Ningbo Innovation Center, Zhejiang University 3University of Washington 4HAOMO.AI Technology Co., Ltd. |
| Pseudocode | No | The paper describes the architecture and processes using diagrams and text, but it does not include pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | https://github.com/pkqbajng/CGFormer |
| Open Datasets | Yes | We evaluate our CGFormer on two datasets: Semantic KITTI [1] and SSC-Bench-KITTI-360 [22]. |
| Dataset Splits | Yes | Semantic KITTI provides RGB images... The dataset includes 10 sequences for training, 1 sequence for validation, and 11 sequences for testing. SSC-Bench-KITTI-360 [22] offers 7 sequences for training, 1 sequence for validation, and 1 sequence for testing. |
| Hardware Specification | Yes | We train CGFormer for 25 epochs on 4 NVIDIA 4090 GPUs, with a batch size of 4. It approximately consumes 19 GB of GPU memory on each GPU during the training phase. |
| Software Dependencies | No | Consistent with previous researches [13, 3, 47], we utilize a 2D UNet based on a pretrained Efficient Net B7 [41] as the image backbone. ... Swin T [30] is employed as the 2D backbone in the TPV-based branch. |
| Experiment Setup | Yes | We train CGFormer for 25 epochs on 4 NVIDIA 4090 GPUs, with a batch size of 4. It approximately consumes 19 GB of GPU memory on each GPU during the training phase. We employ the Adam W [32] optimizer with β1 = 0.9, β2 = 0.99 and set the maximum learning rate to 3 × 10−4. The cosine annealing learning rate strategy is adopted for the learning rate decay, where the cosine warmup strategy is applied for the first 5% iterations. |