reproducibilityindex.ai

Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator

Authors: Xiaolong Wang, Runsen Xu, Zhuofan Cui, Zeyu Wan, Yu Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we first introduce two used datasets, evaluation metrics, and implement details of our network. We then compare the performance of our HC-Net to state-of-the-art and examine its ability to generalize to new measurements within the same areas, across different areas, and across datasets. Finally, we present ablation studies and computational efficiency analysis.
Researcher Affiliation	Academia	Xiaolong Wang ,1, Runsen Xu3, Zuofan Cui1, Zeyu Wan1, Yu Zhang ,1,2 1 College of Control Science and Engineering, Zhejiang University 2 Key Laboratory of Collaborative sensing and autonomous unmanned systems of Zhejiang Province 3 The Chinese University of Hong Kong
Pseudocode	No	The paper does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor structured steps formatted like code.
Open Source Code	Yes	The Code is available at https://github.com/xlwang Dev/HC-Net.
Open Datasets	Yes	VIGOR dataset [42] contains geo-tagged ground-level panoramas and aerial images collected in four cities in the US. Each aerial patch corresponds to a ground area of approximately 70m 70m... KITTI dataset [8] contains ground-level images captured by a moving vehicle with a forward-facing viewpoint, which is a restricted viewpoint. [24] augments the dataset with aerial images.
Dataset Splits	Yes	For validation and hyperparameter tuning, we randomly select 20% of the data from the training set, as done in[36, 12, 35].
Hardware Specification	Yes	Table 4 compares model parameters, inference memory, per-frame inference time, and mean localization error on the VIGOR dataset using a 12th Gen Intel(R) Core(TM) i5-12490F processor, 16GB memory, and an NVIDIA RTX 3050 GPU.
Software Dependencies	No	Py Torch is used for network implementation, and training is done using the Adam W [17] optimizer with a maximum learning rate of 3.5 10 4. The network is trained with a batch size of 16 and a training iteration of 180000. We set the search radius of the correlation updater r = 4 and set α1 = 0.1, α2 = 10, α3 = 1.0, τ = 4 in the loss function.
Experiment Setup	Yes	Our network uses Efficient Net-B0 [29] with pretrained weights on Imagenet [5] as both the ground and aerial feature extractors, with non-shared weights. The satellite image and bird s-eye-view (BEV) transformed from the ground image both have a size of 512 512 on both the VIGOR [42] and KITTI [8] datasets. Py Torch is used for network implementation, and training is done using the Adam W [17] optimizer with a maximum learning rate of 3.5 10 4. The network is trained with a batch size of 16 and a training iteration of 180000. We set the search radius of the correlation updater r = 4 and set α1 = 0.1, α2 = 10, α3 = 1.0, τ = 4 in the loss function.