Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing
Authors: Fengxiang Wang, Yulin Wang, Mingshuo Chen, Haotian Wang, Hongzhen Wang, Haiyan Zhao, Yangang Sun, Shuo Wang, Di Wang, Long Lan, Wenjing Yang, Jing Zhang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Systematic empirical studies validate that Mamba adheres to RS data and parameter scaling laws, with performance scaling reliably as model and data size increase. Furthermore, experiments across scene classification, changing detection, and semantic segmentation tasks demonstrate that Ro MA-pretrained Mamba models consistently outperform Vi T-based counterparts in both accuracy and computational efficiency. The source code and pretrained models were released at Ro MA. |
| Researcher Affiliation | Academia | Fengxiang Wang1, Yulin Wang2, Mingshuo Chen 3, Haiyan Zhao2, Yangang Sun2, Shuo Wang2, Hongzhen Wang2 , Di Wang4,5 , Long Lan1, Wenjing Yang 1 , Jing Zhang4 1 College of Computer Science and Technology, National University of Defense Technology, China 2 Tsinghua University, China 3 Beijing University of Posts and Telecommunications, China 4 School of Computer Science, Wuhan University, China 5 Zhongguancun Academy, China |
| Pseudocode | No | The paper describes methods through figures and textual explanations of steps, such as in "Figure 3: Overview of the Ro MA Pretraining Pipeline" and "Figure 4: Illustration of the Adaptive Rotation Encoding Strategy", but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The source code and pretrained models were released at Ro MA. (Introduction) Question: Does the paper provide open access to the data and code... Answer: [Yes] Justification: ...the data and code cannot be included at this stage, but we commit to open-sourcing all datasets and code as soon as possible. (NeurIPS Checklist Q5) |
| Open Datasets | Yes | We use two scene classification datasets: AID [62] and UCM [63]... We used the OSCD [64] dataset... using common remote sensing datasets: Space Netv1 [65]. We train both the Mamba-B on the Optical RS-4M [16]... We evaluate its performance on small-object categories in the i SAID dataset [71]. |
| Dataset Splits | Yes | We use two scene classification datasets: AID [62] and UCM [63], with training details, including the train-test split ratio, following [10, 13]. Evaluation is based on overall accuracy (OA). The results in Table 2 show Ro MA s competitive performance compared to other pretraining methods. OA(TR=50%) (Table 2) OA (TR=20% ) OA (TR=50% ) (Table 3) Following previous works [61], we kept the experimental setups consistent, using UNet [68] as the decoder. (Change Detection) |
| Hardware Specification | Yes | Mamba-B achieves 1.56 faster inference and reduces GPU memory usage by 78.9% on 1248 1248 resolution images (6084 tokens per image) on a single NVIDIA 4090 GPU (batch size = 2). Both Ro MA-Base and Vi T-Base were tested for GPU memory usage and inference speed on a single NVIDIA A100 (batch size = 1). Similar to previous work, we used 16 24 A100 GPUs as computing resources. |
| Software Dependencies | No | The paper mentions software components like "Adam W optimizer", "UNet [68]", and "Uper Net [69]", but does not specify their version numbers or other library versions (e.g., Python, PyTorch). |
| Experiment Setup | Yes | Our pretraining experiment setup largely follows ARM [22]. We train both the Mamba-B on the Optical RS-4M [16]. We adjust the input image to a size of 196 196, with a patch size of 16, using the Adam W optimizer and a cosine learning rate scheduler. The initial learning rate is set to 1.5e-4, and batch size is set to 256, with a epoch of 400. |