Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

GeoLink: Empowering Remote Sensing Foundation Model with OpenStreetMap Data

Authors: Lubin Bai, Xiuyuan Zhang, Siqi Zhang, Zepeng Zhang, Haoyu Wang, Wei Qin, Shihong Du

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that incorporating OSM data during pretraining enhances the performance of the RS image encoder, while fusing RS and OSM data in downstream tasks improves the FM s adaptability to complex geographic scenarios.
Researcher Affiliation	Academia	1 School of Earth and Space Sciences, Peking University, Beijing, China 2 College of Urban and Environmental Sciences, Peking University, Beijing, China 3 State Key Laboratory of Multimodal Artificial Intelligence Systems Institute of Automation, CAS, Beijing, China 4 Intelligent Maintenance and Operations Systems Lab École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
Pseudocode	No	The paper provides detailed descriptions of the model architecture, learning objectives, and experimental procedures, along with figures illustrating the framework (Figure 1, 2, 4, 5). However, it does not include explicitly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	Code, checkpoints, and using examples are released at https://github.com/bailubin/Geo Link_Neur IPS2025
Open Datasets	Yes	To pretrain Geo Link, we construct a multimodal dataset derived from Sky Scripttop30 [31]. The Sky Script-top30 dataset contains multi-source, multi-resolution RS images with RGB bands, featuring ground sample distances (GSD) ranging from 0.1 m/pixel to 30 m/pixel. ... For comprehensive evaluation, we employ seven RS benchmarks that span diverse spatial resolutions and category systems: MLRSNet [51], Euro SAT [52], WHU-RS19 [53], OPTIMAL-31 [54], RESISC-45 [55], Ai Round [56], and UCMerced [57]. ... We evaluate performance on four benchmarks from PANGAEA-bench: Five-Billion-Pixels [58], AI4Small Farms [59], x View2 [60], and Space Net7 [61] ... Grid-based population density and carbon emission data for Chicago, Singapore, and Shenzhen are sourced from World Pop and ODIAC to construct evaluation benchmarks (details in Appendix C).
Dataset Splits	Yes	All benchmarks are split into 50% for training, 10% for validation, and 40% for testing. ... The data is split into 50% for training, 10% for validation, and 40% for testing. All results are averaged over three runs with different random seeds. For semantic segmentation and change detection tasks, all the settings of data split, learning rate, and loss function follow the default of the PANGAEA-bench [17].
Hardware Specification	Yes	Our experiments are conducted on a Linux server equipped with 4 NVIDIA RTX6000 GPUs (48GB) using bfloat16 precision.
Software Dependencies	No	The paper mentions the use of a BERT language model for encoding OSM tags and Adam W for optimization, but it does not specify version numbers for any software dependencies, libraries, or programming languages used for implementation.
Experiment Setup	Yes	We pretrain it for only 60 epochs (including 5 warmup epochs), with a batch size of 2640, a base learning rate of 1 10 4, and a cosine decay schedule for learning rate cooldown. The default masking ratios for RS patch and OSM graph node is 75% and 20%. And τ = 0.2 for contrastive loss. ... The model optimization employs Adam W with hyperparameters (β1 = 0.9, β2 = 0.95) and a weight decay of 0.05. ... To ensure the reproducibility of our experiments, we provide detailed hyperparameter settings for all downstream tasks in Table 9.