Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator
Authors: Xiaolong Wang, Runsen Xu, Zhuofan Cui, Zeyu Wan, Yu Zhang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we first introduce two used datasets, evaluation metrics, and implement details of our network. We then compare the performance of our HC-Net to state-of-the-art and examine its ability to generalize to new measurements within the same areas, across different areas, and across datasets. Finally, we present ablation studies and computational efficiency analysis. |
| Researcher Affiliation | Academia | Xiaolong Wang ,1, Runsen Xu3, Zuofan Cui1, Zeyu Wan1, Yu Zhang ,1,2 1 College of Control Science and Engineering, Zhejiang University 2 Key Laboratory of Collaborative sensing and autonomous unmanned systems of Zhejiang Province 3 The Chinese University of Hong Kong |
| Pseudocode | No | The paper does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor structured steps formatted like code. |
| Open Source Code | Yes | The Code is available at https://github.com/xlwang Dev/HC-Net. |
| Open Datasets | Yes | VIGOR dataset [42] contains geo-tagged ground-level panoramas and aerial images collected in four cities in the US. Each aerial patch corresponds to a ground area of approximately 70m 70m... KITTI dataset [8] contains ground-level images captured by a moving vehicle with a forward-facing viewpoint, which is a restricted viewpoint. [24] augments the dataset with aerial images. |
| Dataset Splits | Yes | For validation and hyperparameter tuning, we randomly select 20% of the data from the training set, as done in[36, 12, 35]. |
| Hardware Specification | Yes | Table 4 compares model parameters, inference memory, per-frame inference time, and mean localization error on the VIGOR dataset using a 12th Gen Intel(R) Core(TM) i5-12490F processor, 16GB memory, and an NVIDIA RTX 3050 GPU. |
| Software Dependencies | No | Py Torch is used for network implementation, and training is done using the Adam W [17] optimizer with a maximum learning rate of 3.5 10 4. The network is trained with a batch size of 16 and a training iteration of 180000. We set the search radius of the correlation updater r = 4 and set α1 = 0.1, α2 = 10, α3 = 1.0, τ = 4 in the loss function. |
| Experiment Setup | Yes | Our network uses Efficient Net-B0 [29] with pretrained weights on Imagenet [5] as both the ground and aerial feature extractors, with non-shared weights. The satellite image and bird s-eye-view (BEV) transformed from the ground image both have a size of 512 512 on both the VIGOR [42] and KITTI [8] datasets. Py Torch is used for network implementation, and training is done using the Adam W [17] optimizer with a maximum learning rate of 3.5 10 4. The network is trained with a batch size of 16 and a training iteration of 180000. We set the search radius of the correlation updater r = 4 and set α1 = 0.1, α2 = 10, α3 = 1.0, τ = 4 in the loss function. |