Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Scaling Image Geo-Localization to Continent Level

Authors: Philipp Lindenberger, Paul-Edouard Sarlin, Jan Hosang, Marc Pollefeys, Simon Lynen, Eduard Trulls

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive evaluation demonstrates that our approach can localize within 200m more than 68% of queries of a dataset covering a large part of Europe. The code is publicly available at scaling-geoloc.github.io. ... Our experiments on a substantial portion of Europe indicate over 68% top-1 recall within 200 m, previously achievable only by cityor regional-scale retrieval systems [30]. (iii) We conduct a rigorous evaluation on a benchmark that covers most of western Europe at a much finer scale than previously explored in the literature. We systematically compare our approach against state-of-the-art classification and retrieval methods and provide detailed analyses of model components, such as losses, granularity and backbones, and cross-region generalization capabilities.
Researcher Affiliation	Collaboration	Philipp Lindenberger 1 EMAIL Paul-Edouard Sarlin 2 psarlin.com Jan Hosang 2 EMAIL Marc Pollefeys 1 EMAIL Simon Lynen 2 EMAIL Eduard Trulls 2 EMAIL 1ETH Zurich 2Google
Pseudocode	No	No clearly labeled pseudocode or algorithm blocks are present in the paper. The methodology is described in prose and illustrated with figures.
Open Source Code	Yes	Our extensive evaluation demonstrates that our approach can localize within 200m more than 68% of queries of a dataset covering a large part of Europe. The code is publicly available at scaling-geoloc.github.io.
Open Datasets	No	Publicly available ground-level datasets are not of sufficient scale or density for our purposes. We use Google Street View imagery, captured by six rolling-shutter, fish-eye cameras mounted on cars. ... We are unable to release the dataset used in the paper, because we do not own it (we have been granted special permission to use it).
Dataset Splits	Yes	We use sequences captured in year 2023 for evaluation only and sequences from the remaining years 2017 2024 for training. ... BEDENL 150M ... 1.5M ... Europe West 470M ... 4.5M ... We consider a maximum of 120 panos per L=14 S2 cell [32], enforce that panos are least 40 m apart to prevent oversampling, and skip cells that contain less than 5 panos. We then generate 4 image crops per pano.
Hardware Specification	Yes	We train with 128 16GB TPUv2 [37].
Software Dependencies	No	We use the Adam [45] optimizer with learning rates of 0.003 for the encoders and 0.01 for the prototypes, both annealed to 10 6 by the end of training using a cosine schedule. Unless stated otherwise, we use Vision Transformers [46] (B16) initialized with the i BOT [42] weights, and scale the best setups to larger models.
Experiment Setup	Yes	We train our models with 64 examples per batch per device (8192 examples in total) for 200k steps ( 3 epochs on Europe West), with 1 s per step and a total time of 2.5 days. We use the Adam [45] optimizer with learning rates of 0.003 for the encoders and 0.01 for the prototypes, both annealed to 10 6 by the end of training using a cosine schedule. Unless stated otherwise, we use Vision Transformers [46] (B16) initialized with the i BOT [42] weights, and scale the best setups to larger models. During training we randomly drop layers with a probability of 0.1. The SALAD head [24] has 32 64-D clusters and a 128-D class token. All embeddings are l2-normalized. The loss is parameterized by α=0.2, β=100, and λ=0.2.