Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution

Authors: Fengxiang Wang, Mingshuo Chen, Yueying Li, Di Wang, Haotian Wang, Zonghao Guo, Zefan Wang, Shan Boqi, Long Lan, Yulin Wang, Hongzhen Wang, Wenjing Yang, Bo Du, Jing Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on large image-size benchmarks (e.g., XLRS-Bench) demonstrate that our model outperforms all existing open- and closed-source MLLMs, setting a new state-of-the-art. ... We perform SFT training on the Super RS-VQA and High RS-VQA datasets, with a brief overview of the training details and results in this section. Exploratory and ablation studies are presented in Section 3 to clarify the research motivation.
Researcher Affiliation	Academia	1 College of Computer Science and Technology, National University of Defense Technology, China 2 Beijing University of Posts and Telecommunications, China 3 School of Computer Science, Wuhan University, China 4 Zhongguancun Academy, China 5 Tsinghua University, China 6 Beihang University, China
Pseudocode	No	The paper describes methods like "Background Token Pruning" and "Anchored Token Selection" using figures and descriptive text, but it does not present them in a structured pseudocode or algorithm block format.
Open Source Code	No	Datasets and code were released at Geo LLa VA-8K. ... We also plan to open-source our dataset and code to further contribute to the community. ... Due to time constraints and submission size limits, the data and code cannot be included at this stage, but we commit to open-sourcing all datasets and code as soon as possible.
Open Datasets	Yes	To address the first issue, we introduce two novel multimodal RS datasets featuring large image sizes: Super RS-VQA (about 8K 8K) and High RS-VQA (about 2K 2K). To our knowledge, they are the largest image-size RS vision-language datasets to date, covering 22 real-world subtasks and significantly surpassing previous RS image-text datasets in both scale and diversity. ... We curate Super RS-VQA and High RS-VQA, two RS image-text datasets covering 22 realworld subtasks for UHR scenes, featuring so far the largest image sizes in our knowledge. ... We provide a new dataset, which will be released in the future for use by the research community. ... We will create a repository to release the data once the paper is officially published. ... Super RS-VQA and High RS-VQA are self-contained and will be open-sourced on platforms like Hugging Face for easy use. ... The datasets will be made publicly accessible to the research community.
Dataset Splits	No	The paper does not explicitly state specific training/validation/test splits for the newly introduced Super RS-VQA and High RS-VQA datasets. It mentions using XLRS-Bench for evaluation, which implies a test set, but no split methodology for the training data is provided beyond using the entire dataset for SFT.
Hardware Specification	Yes	Similar to previous work, we used 1624 A100 GPUs as computing resources. ... Training with a ratio of 16 frequently led to out-of-memory (OOM) errors, even on multi-GPU setups (8 or 16 GPUs). ... In contrast, the 24-token setting achieved the best average accuracy (51.5) and enabled stable training on a single node with 8 GPUs
Software Dependencies	No	The paper mentions several models and frameworks like LLa VA-Next-7B, LLa VA-1.5, GPT-4o, and CLIP-ViT, but it does not specify version numbers for general software dependencies like Python, PyTorch, or CUDA, which are necessary for full reproducibility.
Experiment Setup	Yes	Table 9: Training Configuration of Geo LLa VA-8K. Conﬁguration Parameter Resolution 8,064 8,064 Dataset 81,367 (Super RS-VQA+High RS-VQA) Batch Size 16 LR: vision 1e-6 LR: proj, LLM 5e-6 Ze RO stage Ze RO 2 Epoch 1