Context-Aware Transformer for 3D Point Cloud Automatic Annotation
Authors: Xiaoyan Qian, Chang Liu, Xiaojuan Qi, Siew-Chong Tan, Edmund Lam, Ngai Wong
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that CAT achieves new state-of-the-art performance compared to existing annotation methods on the KITTI benchmark, even without 3D segmentation, cylindrical object proposals generation or point cloud completion, and multi-modality information combination. and We conduct a series of experiments to confirm the effectiveness of each module in our proposed model, namely, the local encoder, global decoder, and decoder. |
| Researcher Affiliation | Academia | The University of Hong Kong, Pokfulam, Hong Kong {qianxy10, lcon7}@connect.hku.hk, {xjqi, sctan, elam, nwong}@eee.hku.hk |
| Pseudocode | No | The paper describes the model architecture and process in text and diagrams (Figure 2), but does not contain a formal pseudocode or algorithm block. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code for the CAT methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We adopt the KITTI Benchmark (Geiger, Lenz, and Urtasun 2012) for CAT evaluation. The KITTI dataset is one of the best-known benchmarks for 3D detection in autonomous driving (Geiger, Lenz, and Urtasun 2012). |
| Dataset Splits | Yes | For a fair comparison, we used the official split with 500 frames for training and 3769 for evaluation. |
| Hardware Specification | Yes | We train CAT on a single RTX 3090 with a batch size of 24 for 1000 epochs. |
| Software Dependencies | No | The paper mentions 'We implement CAT using Py Torch (Paszke et al. 2019)' but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The local encoder has N1 = 8 layers, each using a multiheaded self-attention with eight heads and an MLP with two linear layers and one Re LU nonlinearity. The global encoder with N2 = 3 layers closely follows the local encoder settings except that it is implemented to perform self-attention along the batch dimension. The decoder has three Transformer decoder layers composed of multiheaded self-attentions, crossattentions, and MLPs. The prediction heads for box regression are two-layer MLPs with a hidden size of 1024. CAT is optimized using the Adam optimizer with the learning rate of 10^-4 decayed by a cosine annealing learning rate scheduler and a weight decay of 0.05. We train CAT on a single RTX 3090 with a batch size of 24 for 1000 epochs. We use standard data augmentations of random shifting, scaling, and flipping. |