Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Actial: Activate Spatial Reasoning Ability of Multimodal Large Language Models

Authors: Xiaoyu Zhan, Wenxuan Huang, Hao Sun, Xinyu Fu, Changfeng Ma, Shaosheng Cao, Bohan Jia, Shaohui Lin, Zhenfei Yin, LEI BAI, Wanli Ouyang, Yuanqi Li, Jie Guo, Yanwen Guo

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments across multiple benchmarks demonstrate the effectiveness of Viewpoint Learning in activating the spatial reasoning ability in MLLMs. We evaluate Actial across multiple benchmarks using VLMEval Kit [11].
Researcher Affiliation Collaboration 1 Nanjing University 2 Xiaohongshu Inc. 3 East China Normal University 4 The Chinese University of Hong Kong 5 Shanghai Jiao Tong University 6 University of Oxford
Pseudocode No The paper includes a pipeline overview in Figure 3, but does not present any structured pseudocode or algorithm blocks.
Open Source Code No Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: The open accesses to the data and code are in progress. But not completed yet.
Open Datasets Yes We present the Viewpoint-100K dataset, consisting of 100K object-centric image pairs with diverse viewpoints and corresponding question-answer pairs. We automatically generate Viewpoint-100K from MVImg Net [47]... We apply Reinforcement Learning (RL) through the Group Relative Policy Optimization (GRPO) algorithm [32], further fine-tuning the model on the SAT dataset [30], a synthetic dataset for spatial aptitude training.
Dataset Splits Yes Training datasets. Actial performs the two-stage fine-tuning strategy. We use Viewpoint-100K training set for knowledge injection and SAT training set [30] for generalization enhancement. The test set of Viewpoint-100K consists of 1,000 examples.
Hardware Specification No The paper does not explicitly mention specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments. It only details training parameters like epochs, learning rates, and batch sizes.
Software Dependencies No The paper mentions using "Qwen2.5-VL-7B-Instruct [1] as our baseline model" but does not specify versions for programming languages, libraries, or other software dependencies necessary for replication (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes In the SFT phase, we trained for 2 epochs with a learning rate of 5e-6, a batch size of 128 and 50 warm-up steps. We mix the Viewpoint-100K dataset and pseudo Co T data as inputs. The interleave ratio is set to 0.9:0.1. In the GRPO phase, we trained for 150 steps with a learning rate of 1e-6 and a batch size of 1024. The model is trained from post-SFT model with an 4K token generation limit, sampling 16 samples per input. During training, we set the Kullback Leibler (KL) penalty [32, 31] to 0.2 and 1e-2 for the hyper-parameters ϵ and β, respectively. Within the reward function, the format reward and the result reward are each assigned a score of 0.5.