Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Authors: Xi Chen, Mingkang Zhu, Shaoteng Liu, Xiaoyang Wu, Xiaogang Xu, Yu Liu, Xiang Bai, Hengshuang Zhao
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that, although trained solely on visual comparison tasks, the learned reasoning ability generalizes effectively to a wide range of questions. Without relying on any human-annotated question-answer pairs, our method achieves significant improvements on multi-image reasoning benchmarks and shows strong performance on general vision tasks. 4 Experiments |
| Researcher Affiliation | Collaboration | Xi Chen1 Mingkang Zhu3 Shaoteng Liu3 Xiaoyang Wu1 Xiaogang Xu3 Yu Liu2 Xiang Bai4 Hengshuang Zhao1 1HKU 2 Tongyi Lab, Alibaba Group 3CUHK 4HUST |
| Pseudocode | Yes | Algorithm 1 Mi Co: Reinforcement Multi-image Reasoning |
| Open Source Code | No | Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: This work is cooperating with the company, we need to apply for approval. |
| Open Datasets | Yes | For the training data, we use Omni Edit [35] for image editing pairs and extract video frames from Vidgen-1M [30]. The part of reinforcement learning follows GRPO [28], we set a |
| Dataset Splits | No | For the training data, we use Omni Edit [35] for image editing pairs and extract video frames from Vidgen-1M [30]. The part of reinforcement learning follows GRPO [28], we set a format reward and accuracy reward with the weight of 1:1, respectively. Besides, we also apply a KL regularization with a weight of 0.01. During training, we follow previous works [23] to skip the rollout group with all correct/false answers. During training, we use a learning rate of 1e-6 and set the batch size of 16. For each training sample, we generate a group of 8 rollouts. We train the model for 600 iterations on 8 A100 GPUs. |
| Hardware Specification | Yes | We train the model for 600 iterations on 8 A100 GPUs. |
| Software Dependencies | No | No specific software dependencies with version numbers are mentioned in the paper, such as Python or PyTorch versions. |
| Experiment Setup | Yes | we set a format reward and accuracy reward with the weight of 1:1, respectively. Besides, we also apply a KL regularization with a weight of 0.01. During training, we follow previous works [23] to skip the rollout group with all correct/false answers. During training, we use a learning rate of 1e-6 and set the batch size of 16. For each training sample, we generate a group of 8 rollouts. We train the model for 600 iterations on 8 A100 GPUs. |