Context and Geometry Aware Voxel Transformer for Semantic Scene Completion

Authors: Zhu Yu, Runmin Zhang, Jiacheng Ying, Junchen Yu, Xiaohai Hu, Lun Luo, Si-Yuan Cao, Hui-liang Shen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that CGFormer achieves state-of-the-art performance on the Semantic KITTI and SSCBench-KITTI-360 benchmarks, attaining a m Io U of 16.87 and 20.05, as well as an Io U of 45.99 and 48.07, respectively.
Researcher Affiliation Collaboration Zhu Yu1 Runmin Zhang1 Jiacheng Ying1 Junchen Yu1 Xiaohai Hu3 Lun Luo4 Si-Yuan Cao2,1 Hui-Liang Shen1 1Zhejiang University 2Ningbo Innovation Center, Zhejiang University 3University of Washington 4HAOMO.AI Technology Co., Ltd.
Pseudocode No The paper describes the architecture and processes using diagrams and text, but it does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes https://github.com/pkqbajng/CGFormer
Open Datasets Yes We evaluate our CGFormer on two datasets: Semantic KITTI [1] and SSC-Bench-KITTI-360 [22].
Dataset Splits Yes Semantic KITTI provides RGB images... The dataset includes 10 sequences for training, 1 sequence for validation, and 11 sequences for testing. SSC-Bench-KITTI-360 [22] offers 7 sequences for training, 1 sequence for validation, and 1 sequence for testing.
Hardware Specification Yes We train CGFormer for 25 epochs on 4 NVIDIA 4090 GPUs, with a batch size of 4. It approximately consumes 19 GB of GPU memory on each GPU during the training phase.
Software Dependencies No Consistent with previous researches [13, 3, 47], we utilize a 2D UNet based on a pretrained Efficient Net B7 [41] as the image backbone. ... Swin T [30] is employed as the 2D backbone in the TPV-based branch.
Experiment Setup Yes We train CGFormer for 25 epochs on 4 NVIDIA 4090 GPUs, with a batch size of 4. It approximately consumes 19 GB of GPU memory on each GPU during the training phase. We employ the Adam W [32] optimizer with β1 = 0.9, β2 = 0.99 and set the maximum learning rate to 3 × 10−4. The cosine annealing learning rate strategy is adopted for the learning rate decay, where the cosine warmup strategy is applied for the first 5% iterations.