Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Latent Chain-of-Thought for Visual Reasoning

Authors: Guohao Sun, Hang Hua, Jian Wang, Jiebo Luo, Sohail Dianat, MAJID RABBANI, Raghuveer Rao, Zhiqiang Tao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that the proposed method enhances the state-of-the-art LVLMs on seven reasoning benchmarks, in terms of effectiveness, generalization, and interpretability. The code is available at https://github.com/heliossun/La Co T. ... Empirically, we develop the proposed La Co T on two base models, Qwen2.5-VL [3] 3B and 7B, where the 7B model achieves an improvement of 6.6% over its base model and outperforms GRPO by 10.6%. The 3B model surpasses its base model with 13.9% and achieves better results than larger models, e.g., LLa VA-Co T-11B and LLa VA-OV-7B, demonstrating the effectiveness of learning to sample latent Co T on reasoning benchmarks.
Researcher Affiliation Collaboration Guohao Sun1,2, , Hang Hua2,3,+, Jian Wang2,+, Jiebo Luo3, Sohail Dianat1, Majid Rabbani1, Raghuveer Rao4, and Zhiqiang Tao1 1Rochester Institute of Technology, 2Snap Inc., 3University of Rochester, 4DEVCOM Army Research Laboratory
Pseudocode No The paper includes mathematical equations and figures to describe the methodology but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code Yes The code is available at https://github.com/heliossun/La Co T.
Open Datasets Yes For training πΦ, we consider two pre-trained LVLMs as the base models, including Qwen2.5-VL-3B& 7B [3] and a mixture of visual reasoning datasets from LLa VA-Co T [47] and R1-Onevision [48]. As shown in Fig. 5, we formulate the instructional data with a new special token Analyzer. ... Benchmarks. This work utilizes three mathematical and one general domain reasoning benchmarks: (i) Math Vista [24]: ... (ii) Math Vision [42]: ... (iii) Math Verse [55]: ... (vi) MMMU [51]: ... Furthermore, we conduct additional experiments on MMMU-pro [52], MMVet [50], and MME [9], where MMMU-Pro is a more robust version of MMMU, designed to assess LVLMs understanding and reasoning capabilities more rigorously.
Dataset Splits Yes We resample 3k visual reasoning sample from the SFT data, where each consists of (image, query, Co T, and answer). To be noted, we use the Co Ts generated by teacher models, such as GPT-4o or Deepseek-R1, as our reference rationale Zref in Eq. (6). ... Benchmarks. This work utilizes three mathematical and one general domain reasoning benchmarks: (i) Math Vista [24]: ... (ii) Math Vision [42]: ... (iii) Math Verse [55]: ... (vi) MMMU [51]: ... Furthermore, we conduct additional experiments on MMMU-pro [52], MMVet [50], and MME [9], where MMMU-Pro is a more robust version of MMMU, designed to assess LVLMs understanding and reasoning capabilities more rigorously. ... Table 1: Test accuracy (%) on visual reasoning benchmarks.
Hardware Specification Yes This work utilizes an 8*80GB GPU-node for training.
Software Dependencies No The paper mentions using techniques and optimizers like Lo RA, Deepspeed Zero-3 stage, gradient-checkpointing, and Adam W, but it does not provide specific software names along with their version numbers required for replication.
Experiment Setup Yes More hyperparameter settings can be found in Appendix B.5. Table T7: Hyperparameters for training. Lo RA dropout 0.05 Batch size (SFT) 2 Batch size (RGFN) 1 Gradient accumulation (SFT) 16 Learning rate 0.00001 Optimizer Adam W Weight decay 0.05 Temperature max 1.0 Temperature min 0.5 Reward temperature start 1.0 Reward temperature end 0.7 Reward temperature horizon 50 exploration number 6 λ 8 τmax 1.5 τmin 1.0 Maximum rationale length 700 Minimum rationale length 64