Text to Point Cloud Localization with Relation-Enhanced Transformer

Authors: Guangzhi Wang, Hehe Fan, Mohan Kankanhalli

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the KITTI360Pose dataset demonstrate that our approach surpasses the previous state-of-the-art method by large margins. We validated the effectiveness of our method on the KITTI360Pose benchmark (Kolmet et al. 2022). Extensive experiments demonstrate that the proposed method can surpass the previous approach by a large margin, leading to new state-of-the-art results. Additional ablation studies further corroborate the effectiveness of each component in the proposed method.
Researcher Affiliation Academia Guangzhi Wang1, Hehe Fan2, Mohan Kankanhalli2 1Institute of Data Science, National University of Singapore 2School of Computing, National University of Singapore
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code available at: https://github.com/daoyuan98/text2pos-ret
Open Datasets Yes We evaluate our method on the recently proposed KITTI360Pose dataset (Kolmet et al. 2022)
Dataset Splits Yes We follow (Kolmet et al. 2022) to use five scenes for training, one for validation, and the remaining three for testing.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running experiments.
Software Dependencies No The paper mentions optimizers like 'Adam W optimizer' and 'Adam optimizer' but does not specify software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes For the coarse stage, we trained the model with Adam W optimizer (Loshchilov and Hutter 2018) with a learning rate of 2e-4. The models are trained for a total of 18 epochs while the learning rate is decayed by 10 at the 9-th epoch. The α is set to 0.35. For the fine stage, we first train the matcher with a learning rate of 5e-4 for a total of 16 epochs. Afterwards, we fix the matcher and train the regressor based on the matching results for 10 epochs with a learning rate of 1e-4. The regressor is formulated as a 3 layer Multi-Layer Perceptron. Both of the two steps adopt an Adam (Kingma and Ba 2014) optimizer. The RET has 2 encoder layers for both point cloud part and linguistic part, each utilizing the Relation-enhanced Attention (RSA) mechanism with 4 heads and hidden dimension 2048.