Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator

Authors: Xiaolong Wang, Runsen Xu, Zhuofan Cui, Zeyu Wan, Yu Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we first introduce two used datasets, evaluation metrics, and implement details of our network. We then compare the performance of our HC-Net to state-of-the-art and examine its ability to generalize to new measurements within the same areas, across different areas, and across datasets. Finally, we present ablation studies and computational efficiency analysis.
Researcher Affiliation Academia Xiaolong Wang ,1, Runsen Xu3, Zuofan Cui1, Zeyu Wan1, Yu Zhang ,1,2 1 College of Control Science and Engineering, Zhejiang University 2 Key Laboratory of Collaborative sensing and autonomous unmanned systems of Zhejiang Province 3 The Chinese University of Hong Kong
Pseudocode No The paper does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor structured steps formatted like code.
Open Source Code Yes The Code is available at https://github.com/xlwang Dev/HC-Net.
Open Datasets Yes VIGOR dataset [42] contains geo-tagged ground-level panoramas and aerial images collected in four cities in the US. Each aerial patch corresponds to a ground area of approximately 70m 70m... KITTI dataset [8] contains ground-level images captured by a moving vehicle with a forward-facing viewpoint, which is a restricted viewpoint. [24] augments the dataset with aerial images.
Dataset Splits Yes For validation and hyperparameter tuning, we randomly select 20% of the data from the training set, as done in[36, 12, 35].
Hardware Specification Yes Table 4 compares model parameters, inference memory, per-frame inference time, and mean localization error on the VIGOR dataset using a 12th Gen Intel(R) Core(TM) i5-12490F processor, 16GB memory, and an NVIDIA RTX 3050 GPU.
Software Dependencies No Py Torch is used for network implementation, and training is done using the Adam W [17] optimizer with a maximum learning rate of 3.5 10 4. The network is trained with a batch size of 16 and a training iteration of 180000. We set the search radius of the correlation updater r = 4 and set α1 = 0.1, α2 = 10, α3 = 1.0, τ = 4 in the loss function.
Experiment Setup Yes Our network uses Efficient Net-B0 [29] with pretrained weights on Imagenet [5] as both the ground and aerial feature extractors, with non-shared weights. The satellite image and bird s-eye-view (BEV) transformed from the ground image both have a size of 512 512 on both the VIGOR [42] and KITTI [8] datasets. Py Torch is used for network implementation, and training is done using the Adam W [17] optimizer with a maximum learning rate of 3.5 10 4. The network is trained with a batch size of 16 and a training iteration of 180000. We set the search radius of the correlation updater r = 4 and set α1 = 0.1, α2 = 10, α3 = 1.0, τ = 4 in the loss function.