reproducibilityindex.ai

Cross-view Geo-localization with Layer-to-Layer Transformer

Authors: Hongji Yang, Xiufan Lu, Yingying Zhu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our L2LTR performs favorably against state-of-the-art methods on standard, ﬁne-grained, and cross-dataset cross-view geo-localization tasks.
Researcher Affiliation	Academia	Hongji Yang Xiufan Lu Yingying Zhu College of Computer Science and Software Engineering Shenzhen University {yanghongji2020, luxiufan2019}@email.szu.edu.cn zhuyy@szu.edu.cn
Pseudocode	No	The paper includes illustrations of the model architecture (Figure 1) and mathematical formulations, but no structured pseudocode or algorithm blocks are provided.
Open Source Code	Yes	The code is available online.3 https://github.com/yanghongji2007/cross_view_localization_L2LTR
Open Datasets	Yes	To verify our model s effectiveness, we conduct extensive experiments on three widely used benchmarks: CVUSA [24] and CVACT [9] (including CVACT_val and CVACT_test). The CVUSA dataset provides 35,532 image pairs for training and 8,884 image pairs for testing. The CVACT dataset contains 35,532 pairs for training and 8,884 pairs for validation (denoted as CVACT_val).
Dataset Splits	Yes	The CVUSA dataset provides 35,532 image pairs for training and 8,884 image pairs for testing. The CVACT dataset contains 35,532 pairs for training and 8,884 pairs for validation (denoted as CVACT_val).
Hardware Specification	Yes	The model is trained using Adam W [10] with a cosine learning rate schedule on a 32GB NVIDIA V100 GPU.
Software Dependencies	No	The paper mentions 'Adam W [10]' as the optimizer and 'Image Net [3]' for pre-trained parameters, but does not specify version numbers for any software dependencies like Python, PyTorch, TensorFlow, etc. Only the optimizer name is given.
Experiment Setup	Yes	If not speciﬁed, the ground and aerial image sizes are set to 128 512 and 256 256, respectively. We empirically set model depth L to 12 and initialize our L2LTR with pre-trained parameters on Image Net [3]. The model is trained using Adam W [10] with a cosine learning rate schedule on a 32GB NVIDIA V100 GPU. The learning rate is set to 1e-4, the weight decay is chosen to 0.03, and the batch size is 32. For the weighted soft-margin triplet loss [8], α is set to 10.