Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding

Authors: Yiyang Zhou, Yangfan He, Yaofeng Su, Siwei Han, Joel Jang, Gedas Bertasius, Mohit Bansal, Huaxiu Yao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on 12 datasets across three core applications video understanding, video reasoning enhancement, and vision-language-action model alignment demonstrate significant gains in generalization and reasoning, with improvements of up to 6.9%, 2.1%, and 9.8%, respectively, highlighting the effectiveness and versatility of the proposed framework.
Researcher Affiliation Academia 1UNC-Chapel Hill, 2University of Washington
Pseudocode Yes Algorithm 1 Re Agent-V Video Understanding Pipeline
Open Source Code Yes Our code are available at https://github.com/aiming-lab/Re Agent-V.
Open Datasets Yes Extensive experiments on 12 datasets across three core applications video understanding, video reasoning enhancement, and vision-language-action model alignment demonstrate significant gains... All datasets are public datasets.
Dataset Splits Yes Specifically, we apply Re Agent-V to video samples from the Video-R1-260k [10] dataset, using the scores in E output by Re Agent-V during inference as the sample importance score. We retain videos and their original questions with importance scores lower than 5 (out of 10)... The remaining three generalization tasks are kept unseen and excluded from alignment. After data collection, we use Re Agent-V to evaluate each trajectory segment, assigning scalar reward scores... For each task, we sample 20 best-vs-worst trajectory pairs... which are then partitioned into two RLDS-formatted datasets: a chosen set containing high-reward trajectories and a rejected set with low-reward counterparts.
Hardware Specification Yes Experiments are run on two H100-96GB GPUs with NVLink... We then fine-tune the Qwen2.5-VL model on 8 NVIDIA H100 80GB GPUs... We conduct all VLA alignment training on a single NVIDIA A100 GPU (80GB).
Software Dependencies No Videos are processed using decord and ffmpeg, with audio chunked for transcription.
Experiment Setup Yes During evaluation, both the target and critic models are set to the Re Agent-V-enhanced model itself. Detailed descriptions of the task prompt templates, the tool list, benchmark datasets, and baseline models are provided in Appendix A and B... Experiments are run on two H100-96GB GPUs with NVLink, using 64 frames per video... Training employs Lo RA (rank=32) with a learning rate of 2e-5 and dropout of 0.0. We use gradient accumulation with a step size of 4... All results reported in the main paper are obtained after 6,800 steps of alignment training.