Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization

Authors: Pengyue Jia, Seongheon Park, Song Gao, Xiangyu Zhao, Sharon Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the effectiveness of Geo Ranker through extensive experiments on two widely used benchmarks: IM2GPS3K [17] and YFCC4K [18]. Geo Ranker achieves state-of-the-art performance across all geographic thresholds.
Researcher Affiliation	Academia	Pengyue Jia1,2, Seongheon Park2, Song Gao3, Xiangyu Zhao1 , Sharon Li2 1Department of Data Science, City University of Hong Kong, 2Department of Computer Sciences, University of Wisconsin-Madison 3Department of Geography, University of Wisconsin-Madison EMAIL,EMAIL
Pseudocode	No	The paper describes the methodology using textual explanations, mathematical formulas, and diagrams (Figure 2), but it does not include any explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	We also release our code, checkpoint, and dataset online2 for ease of reproduction. 2https://github.com/Applied-Machine-Learning-Lab/Geo Ranker
Open Datasets	Yes	To support this training paradigm, we construct Geo Ranking, a new dataset that provides spatially diverse candidate sets for each query. Each candidate is annotated with GPS coordinates, textual descriptions (e.g., city, country), and image data. To the best of our knowledge, this is the first ranking dataset specifically designed for modeling spatial relationships among geographic entities. We believe this effort will significantly contribute to advancing research in related domains.
Dataset Splits	Yes	For evaluation, we follow previous work [8, 10, 14] and assess performance on two widely used public benchmarks IM2GPS3K [17] and YFCC4K [18]. The evaluation metric reports the percentage of predictions whose geodesic distance to the ground-truth coordinates falls within a set of thresholds: 1km, 25km, 200km, 750km, and 2500km.
Hardware Specification	Yes	All experiments are conducted using Pytorch on 4 NVIDIA L40S GPUs. Most experiments were conducted on four NVIDIA L40S GPUs. We also performed tests on two NVIDIA H200 GPUs, where training took approximately 7.5 hours per epoch with a batch size of 4, consuming around 90 GB of GPU memory per device with the gradient checkpointing off.
Software Dependencies	No	The paper mentions software like Pytorch and the AdamW optimizer, and uses models like Qwen2-VL-7b-Instruct and CLIP, but it does not specify version numbers for these software components.
Experiment Setup	Yes	Geo Ranker is fine-tuned with Adam W [58] optimizer with a learning rate of 1e-4, a batch size of 4, and for 1 epoch. For joint optimization, we set the weighting coefficient λ = 0.7, and K(1) = 1. For Lo RA fine-tuning, we target the q_proj, k_proj, and v_proj modules, with a rank of 16, scaling factor of 32, and Lo RA dropout of 0.05.