Deep Homography Estimation for Visual Place Recognition

Authors: Feng Lu, Shuting Dong, Lijun Zhang, Bingxi Liu, Xiangyuan Lan, Dongmei Jiang, Chun Yuan

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on benchmark datasets show that our method can outperform several state-of-the-art methods. And it is more than one order of magnitude faster than the mainstream hierarchical VPR methods using RANSAC. The code is released at https://github.com/Lu-Feng/DHE-VPR.
Researcher Affiliation Academia Feng Lu1,2, Shuting Dong1,2, Lijun Zhang3, Bingxi Liu2,4, Xiangyuan Lan2*, Dongmei Jiang2, Chun Yuan1,2* 1Tsinghua Shenzhen International Graduate School, Tsinghua University 2Peng Cheng Laboratory 3Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences 4Southern University of Science and Technology {lf22@mails, dst21@mails, yuanc@sz}.tsinghua.edu.cn, zhanglijun@cigit.ac.cn, {liubx, lanxy, jiangdm}@pcl.ac.cn
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The code is released at https://github.com/Lu-Feng/DHE-VPR.
Open Datasets Yes We conduct experiments using multiple VPR datasets: MSLS (Warburg et al. 2020), Pitts30k (Torii et al. 2013), Nordland (downsampled test set with 224x224 image size) (Olid et al. 2018), and St. Lucia (Berton et al. 2022).
Dataset Splits Yes We conduct several ablation experiments on the Pitts30k and MSLS (val) datasets to validate the design of our DHE network and training strategy.
Hardware Specification Yes Experiments are implemented using Py Torch on an NVIDIA Ge Force RTX 3090 GPU.
Software Dependencies No The paper mentions 'Py Torch' but does not specify its version or any other software dependencies with version numbers.
Experiment Setup Yes The re-projection error threshold θ of the inlier is set to 1.5 times the patch size for RANSAC, and 3 times the patch size for geometric verification using DHE (in inference). The margin m in Eq. 9 is set to 0.1, and the weight λ in Eq. 11 is 100. Experiments are implemented using Py Torch on an NVIDIA Ge Force RTX 3090 GPU. For the initialization of the DHE network, the Adam optimizer is used with learning rate = 0.0001 (multiplied by 0.8 after every 5 epochs) and batch size = 16. We train the network for 100 epochs (2k iterations per epoch) on MSLS-train. The implementation of the backbone initialization and the fine-tuning of entire model basically follows the benchmark (Berton et al. 2022), with learning rate = 0.00001 and batch size = 4. For the backbone initialization, we train CCT-14 on MSLS-train for MSLS, Nordland, and St. Lucia, and further train it on Pitts30k-train for Pitts30k. For fine-tuning, the DHE network and the last 2 encoder layers in backbone are updatable. The model for Pitts30k is fine-tuned on Pitts30ktrain for 40 epochs (5k iterations per epoch), while the model for others is fine-tuned on MSLS-train for 2 epochs (10k iterations per epoch). We use 2 hard negative images in a triplet.