Text to Point Cloud Localization with Relation-Enhanced Transformer
Authors: Guangzhi Wang, Hehe Fan, Mohan Kankanhalli
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on the KITTI360Pose dataset demonstrate that our approach surpasses the previous state-of-the-art method by large margins. We validated the effectiveness of our method on the KITTI360Pose benchmark (Kolmet et al. 2022). Extensive experiments demonstrate that the proposed method can surpass the previous approach by a large margin, leading to new state-of-the-art results. Additional ablation studies further corroborate the effectiveness of each component in the proposed method. |
| Researcher Affiliation | Academia | Guangzhi Wang1, Hehe Fan2, Mohan Kankanhalli2 1Institute of Data Science, National University of Singapore 2School of Computing, National University of Singapore |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code available at: https://github.com/daoyuan98/text2pos-ret |
| Open Datasets | Yes | We evaluate our method on the recently proposed KITTI360Pose dataset (Kolmet et al. 2022) |
| Dataset Splits | Yes | We follow (Kolmet et al. 2022) to use five scenes for training, one for validation, and the remaining three for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions optimizers like 'Adam W optimizer' and 'Adam optimizer' but does not specify software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For the coarse stage, we trained the model with Adam W optimizer (Loshchilov and Hutter 2018) with a learning rate of 2e-4. The models are trained for a total of 18 epochs while the learning rate is decayed by 10 at the 9-th epoch. The α is set to 0.35. For the fine stage, we first train the matcher with a learning rate of 5e-4 for a total of 16 epochs. Afterwards, we fix the matcher and train the regressor based on the matching results for 10 epochs with a learning rate of 1e-4. The regressor is formulated as a 3 layer Multi-Layer Perceptron. Both of the two steps adopt an Adam (Kingma and Ba 2014) optimizer. The RET has 2 encoder layers for both point cloud part and linguistic part, each utilizing the Relation-enhanced Attention (RSA) mechanism with 4 heads and hidden dimension 2048. |