Sparse Cross-Scale Attention Network for Efficient LiDAR Panoptic Segmentation
Authors: Shuangjie Xu, Rui Wan, Maosheng Ye, Xiaoyi Zou, Tongyi Cao2920-2928
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare our model with state-of-the-art methods and perform an ablation study to demonstrate the advantage of each module in SCAN. |
| Researcher Affiliation | Collaboration | 1The Hong Kong University of Science and Technology 2DEEPROUTE.AI |
| Pseudocode | No | The paper describes the network architecture with diagrams and text, but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | Semantic KITTI. Semantic KITTI (Behley et al. 2019; Behley, Milioto, and Stachniss 2020) is a challenging dataset, proposed to provide full 360-degree point-wise labels for the large-scale Li DAR data of the KITTI Odometry Benchmark (Geiger, Lenz, and Urtasun 2012). Nuscenes. The large-scale Nuscenes dataset (Caesar et al. 2019) has newly released the panoptic segmentation challenge. |
| Dataset Splits | Yes | Semantic KITTI... It contains 23201 scans with 3D semantic and instance annotations for training and 20351 for testing. Nuscenes... The dataset contains 1000 scenes, including 850 scenes for training and validation and 150 scenes for testing. |
| Hardware Specification | Yes | We implement the network in Py Torch (Paszke et al. 2017) and train the SCAN model on 8 NVIDIA 3090 GPUs for 40 epochs with Adam (Kingma and Ba 2014) and 1-Cycle Schedule (Smith 2017). |
| Software Dependencies | No | The paper mentions implementing the network in 'Py Torch' but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | Training. We fix the voxelization space to be limited in [[ 48], [ 48], [ 3, 1.5]]. We do global rotation along z axis in range of [ π, π] degrees and flip the points along x, y, and x + y axes. Each augmentation is applied independently with a probability of 50%. In addition, we set the default scale s = [0.2, 0.2, 0.1] measured in metres, thus the w = 120, h = 120 for the BEV sparse centroid distribution. The feature channels are set to C = 64 in the network, and we configure the GKA attention by setting the number of heads 8, attention depth 2, channels of each head 16 and disabling the causality inference. We implement the network in Py Torch (Paszke et al. 2017) and train the SCAN model on 8 NVIDIA 3090 GPUs for 40 epochs with Adam (Kingma and Ba 2014) and 1-Cycle Schedule (Smith 2017). We set the batch size per GPU as 4 and the initial learning rate 0.003. The learning rate first raises tenfold before the 16-th epoch and then decays. |