Deep Homography Estimation for Visual Place Recognition
Authors: Feng Lu, Shuting Dong, Lijun Zhang, Bingxi Liu, Xiangyuan Lan, Dongmei Jiang, Chun Yuan
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on benchmark datasets show that our method can outperform several state-of-the-art methods. And it is more than one order of magnitude faster than the mainstream hierarchical VPR methods using RANSAC. The code is released at https://github.com/Lu-Feng/DHE-VPR. |
| Researcher Affiliation | Academia | Feng Lu1,2, Shuting Dong1,2, Lijun Zhang3, Bingxi Liu2,4, Xiangyuan Lan2*, Dongmei Jiang2, Chun Yuan1,2* 1Tsinghua Shenzhen International Graduate School, Tsinghua University 2Peng Cheng Laboratory 3Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences 4Southern University of Science and Technology {lf22@mails, dst21@mails, yuanc@sz}.tsinghua.edu.cn, zhanglijun@cigit.ac.cn, {liubx, lanxy, jiangdm}@pcl.ac.cn |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is released at https://github.com/Lu-Feng/DHE-VPR. |
| Open Datasets | Yes | We conduct experiments using multiple VPR datasets: MSLS (Warburg et al. 2020), Pitts30k (Torii et al. 2013), Nordland (downsampled test set with 224x224 image size) (Olid et al. 2018), and St. Lucia (Berton et al. 2022). |
| Dataset Splits | Yes | We conduct several ablation experiments on the Pitts30k and MSLS (val) datasets to validate the design of our DHE network and training strategy. |
| Hardware Specification | Yes | Experiments are implemented using Py Torch on an NVIDIA Ge Force RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify its version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | The re-projection error threshold θ of the inlier is set to 1.5 times the patch size for RANSAC, and 3 times the patch size for geometric verification using DHE (in inference). The margin m in Eq. 9 is set to 0.1, and the weight λ in Eq. 11 is 100. Experiments are implemented using Py Torch on an NVIDIA Ge Force RTX 3090 GPU. For the initialization of the DHE network, the Adam optimizer is used with learning rate = 0.0001 (multiplied by 0.8 after every 5 epochs) and batch size = 16. We train the network for 100 epochs (2k iterations per epoch) on MSLS-train. The implementation of the backbone initialization and the fine-tuning of entire model basically follows the benchmark (Berton et al. 2022), with learning rate = 0.00001 and batch size = 4. For the backbone initialization, we train CCT-14 on MSLS-train for MSLS, Nordland, and St. Lucia, and further train it on Pitts30k-train for Pitts30k. For fine-tuning, the DHE network and the last 2 encoder layers in backbone are updatable. The model for Pitts30k is fine-tuned on Pitts30ktrain for 40 epochs (5k iterations per epoch), while the model for others is fine-tuned on MSLS-train for 2 epochs (10k iterations per epoch). We use 2 hard negative images in a triplet. |