Sparse Cross-Scale Attention Network for Efficient LiDAR Panoptic Segmentation

Authors: Shuangjie Xu, Rui Wan, Maosheng Ye, Xiaoyi Zou, Tongyi Cao2920-2928

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare our model with state-of-the-art methods and perform an ablation study to demonstrate the advantage of each module in SCAN.
Researcher Affiliation Collaboration 1The Hong Kong University of Science and Technology 2DEEPROUTE.AI
Pseudocode No The paper describes the network architecture with diagrams and text, but does not include formal pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets Yes Semantic KITTI. Semantic KITTI (Behley et al. 2019; Behley, Milioto, and Stachniss 2020) is a challenging dataset, proposed to provide full 360-degree point-wise labels for the large-scale Li DAR data of the KITTI Odometry Benchmark (Geiger, Lenz, and Urtasun 2012). Nuscenes. The large-scale Nuscenes dataset (Caesar et al. 2019) has newly released the panoptic segmentation challenge.
Dataset Splits Yes Semantic KITTI... It contains 23201 scans with 3D semantic and instance annotations for training and 20351 for testing. Nuscenes... The dataset contains 1000 scenes, including 850 scenes for training and validation and 150 scenes for testing.
Hardware Specification Yes We implement the network in Py Torch (Paszke et al. 2017) and train the SCAN model on 8 NVIDIA 3090 GPUs for 40 epochs with Adam (Kingma and Ba 2014) and 1-Cycle Schedule (Smith 2017).
Software Dependencies No The paper mentions implementing the network in 'Py Torch' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes Training. We fix the voxelization space to be limited in [[ 48], [ 48], [ 3, 1.5]]. We do global rotation along z axis in range of [ π, π] degrees and flip the points along x, y, and x + y axes. Each augmentation is applied independently with a probability of 50%. In addition, we set the default scale s = [0.2, 0.2, 0.1] measured in metres, thus the w = 120, h = 120 for the BEV sparse centroid distribution. The feature channels are set to C = 64 in the network, and we configure the GKA attention by setting the number of heads 8, attention depth 2, channels of each head 16 and disabling the causality inference. We implement the network in Py Torch (Paszke et al. 2017) and train the SCAN model on 8 NVIDIA 3090 GPUs for 40 epochs with Adam (Kingma and Ba 2014) and 1-Cycle Schedule (Smith 2017). We set the batch size per GPU as 4 and the initial learning rate 0.003. The learning rate first raises tenfold before the 16-th epoch and then decays.