Context-Aware Transformer for 3D Point Cloud Automatic Annotation

Authors: Xiaoyan Qian, Chang Liu, Xiaojuan Qi, Siew-Chong Tan, Edmund Lam, Ngai Wong

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that CAT achieves new state-of-the-art performance compared to existing annotation methods on the KITTI benchmark, even without 3D segmentation, cylindrical object proposals generation or point cloud completion, and multi-modality information combination. and We conduct a series of experiments to confirm the effectiveness of each module in our proposed model, namely, the local encoder, global decoder, and decoder.
Researcher Affiliation Academia The University of Hong Kong, Pokfulam, Hong Kong {qianxy10, lcon7}@connect.hku.hk, {xjqi, sctan, elam, nwong}@eee.hku.hk
Pseudocode No The paper describes the model architecture and process in text and diagrams (Figure 2), but does not contain a formal pseudocode or algorithm block.
Open Source Code No The paper does not contain an explicit statement about releasing the source code for the CAT methodology, nor does it provide a link to a code repository.
Open Datasets Yes We adopt the KITTI Benchmark (Geiger, Lenz, and Urtasun 2012) for CAT evaluation. The KITTI dataset is one of the best-known benchmarks for 3D detection in autonomous driving (Geiger, Lenz, and Urtasun 2012).
Dataset Splits Yes For a fair comparison, we used the official split with 500 frames for training and 3769 for evaluation.
Hardware Specification Yes We train CAT on a single RTX 3090 with a batch size of 24 for 1000 epochs.
Software Dependencies No The paper mentions 'We implement CAT using Py Torch (Paszke et al. 2019)' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes The local encoder has N1 = 8 layers, each using a multiheaded self-attention with eight heads and an MLP with two linear layers and one Re LU nonlinearity. The global encoder with N2 = 3 layers closely follows the local encoder settings except that it is implemented to perform self-attention along the batch dimension. The decoder has three Transformer decoder layers composed of multiheaded self-attentions, crossattentions, and MLPs. The prediction heads for box regression are two-layer MLPs with a hidden size of 1024. CAT is optimized using the Adam optimizer with the learning rate of 10^-4 decayed by a cosine annealing learning rate scheduler and a weight decay of 0.05. We train CAT on a single RTX 3090 with a batch size of 24 for 1000 epochs. We use standard data augmentations of random shifting, scaling, and flipping.