reproducibilityindex.ai

Context-Aware Transformer for 3D Point Cloud Automatic Annotation

Authors: Xiaoyan Qian, Chang Liu, Xiaojuan Qi, Siew-Chong Tan, Edmund Lam, Ngai Wong

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that CAT achieves new state-of-the-art performance compared to existing annotation methods on the KITTI benchmark, even without 3D segmentation, cylindrical object proposals generation or point cloud completion, and multi-modality information combination. and We conduct a series of experiments to confirm the effectiveness of each module in our proposed model, namely, the local encoder, global decoder, and decoder.
Researcher Affiliation	Academia	The University of Hong Kong, Pokfulam, Hong Kong {qianxy10, lcon7}@connect.hku.hk, {xjqi, sctan, elam, nwong}@eee.hku.hk
Pseudocode	No	The paper describes the model architecture and process in text and diagrams (Figure 2), but does not contain a formal pseudocode or algorithm block.
Open Source Code	No	The paper does not contain an explicit statement about releasing the source code for the CAT methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	We adopt the KITTI Benchmark (Geiger, Lenz, and Urtasun 2012) for CAT evaluation. The KITTI dataset is one of the best-known benchmarks for 3D detection in autonomous driving (Geiger, Lenz, and Urtasun 2012).
Dataset Splits	Yes	For a fair comparison, we used the official split with 500 frames for training and 3769 for evaluation.
Hardware Specification	Yes	We train CAT on a single RTX 3090 with a batch size of 24 for 1000 epochs.
Software Dependencies	No	The paper mentions 'We implement CAT using Py Torch (Paszke et al. 2019)' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	The local encoder has N1 = 8 layers, each using a multiheaded self-attention with eight heads and an MLP with two linear layers and one Re LU nonlinearity. The global encoder with N2 = 3 layers closely follows the local encoder settings except that it is implemented to perform self-attention along the batch dimension. The decoder has three Transformer decoder layers composed of multiheaded self-attentions, crossattentions, and MLPs. The prediction heads for box regression are two-layer MLPs with a hidden size of 1024. CAT is optimized using the Adam optimizer with the learning rate of 10^-4 decayed by a cosine annealing learning rate scheduler and a weight decay of 0.05. We train CAT on a single RTX 3090 with a batch size of 24 for 1000 epochs. We use standard data augmentations of random shifting, scaling, and flipping.