Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Fast-Slow Thinking GRPO for Large Vision-Language Model Reasoning
Authors: Wenyi Xiao, Leilei Gan
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments across seven reasoning benchmarks demonstrate that FAST achieves state-of-the-art accuracy with over 10% relative improvement compared to the base model, while reducing token usage by 32.7-67.3% compared to previous slow-thinking approaches, effectively balancing reasoning length and accuracy. We conduct extensive experiments on a range of reasoning benchmarks for LVLMs, and the experimental results have demonstrated the effect of the proposed method. |
| Researcher Affiliation | Academia | Wenyi Xiao Zhejiang University EMAIL Leilei Gan Zhejiang University EMAIL |
| Pseudocode | Yes | Algorithm 1 FAST-GRPO Training |
| Open Source Code | Yes | https://github.com/Mr-Loevan/FAST Correspondence to Leilei Gan. We commit to releasing all the code, data, and model checkpoints for experimental results reproducibility. |
| Open Datasets | Yes | Training Dataset. Starting with 500K questions from LLa VA-Co T [23], Mulberry [39], and Math V-360K [40], we first apply filters for answer verifiability. Evaluation Benchmarks. we evaluate on 7 widely used multimodal benchmarks: (1) Math Vision [42], (2) Math Verse [43], (3) Math Vista [44], (5) We Math [46], (6) Dyna Math [47], and (7) MM-Vet [48]. |
| Dataset Splits | Yes | To this end, we stratified the Geometry3K training dataset into three difficulty tiers using the pass@8 metric (i.e., the probability of correctly solving a question within eight attempts): Easy (0.75 pass@8), Medium (0.25 < pass@8 < 0.75), and Hard (pass@8 0.25). This categorization resulted in approximately 35% Easy, 25% Medium, and 40% Hard samples. Second, we apply Slow-to-Fast sampling to remove questions with extreme extrinsic difficulty scores (Sextrinsic = 0 or 1), yielding 18K training questions. |
| Hardware Specification | Yes | All training experiments were conducted using H20 GPUs. |
| Software Dependencies | No | We implement FAST using Qwen2.5-VL-3B and 7B as our base models. Model inference in evaluations is performed using the v LLM framework [59], and our training implementation extends the Ve RL codebase [60]. |
| Experiment Setup | Yes | Table 9: Training Hyperparameters Hyperparameter Value Model Qwen2.5-VL Epochs 10 Learning Rate 1e-6 Train Batch Size 512 Temperature 1.0 Rollout per Prompt 8 Prompt Max Length 4096 Generation Max Length 4096 Max KL Coefficient 0.03 Min KL Coefficient 0.001 Precision BF16 Max Pixels 1000000 λf 0.5 λt 0.5 |