Activating Self-Attention for Multi-Scene Absolute Pose Regression
Authors: Miso Lee, Jihwan Kim, Jae-Pil Heo
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our solution recovers the self-attention successfully by preventing the distortion of query-key space and keeping high capacity of self-attention map [22]. As a result, our model outperforms existing MS-APR methods in both outdoor and indoor scenes without additional memory during inference, upholding the original purpose of MS-APR. |
| Researcher Affiliation | Academia | Miso Lee Sungkyunkwan University dlalth557@skku.edu Jihwan Kim Sungkyunkwan University damien@skku.edu Jae-Pil Heo Sungkyunkwan University jaepilheo@skku.edu |
| Pseudocode | No | The paper describes the proposed methods in text and equations but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | We include the code in the supplemental materials. |
| Open Datasets | Yes | Datasets. We train and evaluate the model on outdoor and indoor datasets [8, 27], which include RGB images labeled with 6-Do F camera poses. Firstly, we use the Cambridge Landmarks which consist of six outdoor scenes scaled from 875m2 to 5600m2. Each scene contains 200 to 1500 training data. ... On the other hand, we use the 7Scenes dataset which consists of seven indoor scenes scaled from 1m2 to 18m2. Each scene includes from 1000 to 7000 images. |
| Dataset Splits | No | The paper mentions using training data and evaluation on datasets but does not explicitly specify the proportion or number of samples allocated for validation splits. |
| Hardware Specification | Yes | We train the model with a single RTX3090 GPU |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' but does not specify version numbers for any programming languages, libraries, or other software components used in the implementation. |
| Experiment Setup | Yes | We train the model with a single RTX3090 GPU, Adam optimizer with β1 = 0.9, β2 = 0.999, ϵ = 10 10, and the batch size of 8. For the 7Scenes dataset, we train the model for 30 epochs with the initial learning rate of 1 10 4, reducing the learning rate by 1/10 every 10 epochs. In the case of Cambridge Landmarks dataset, we train the model for 500 epochs with the initial learning rate of 1 10 4, reducing the learning rate by 1/10 every 200 epochs. ... Both for the position and orientation transformer encoder-decoder, the number of layers L is 6 and the number of heads H is 8. Lastly, we set the λaux for our query-key alignment loss as 0.1. |