Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought

Authors: Chao Huang, Benfeng Wang, Wei Wang, Jie Wen, Chengliang Liu, Li Shen, Xiaochun Cao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that Vad-R1 achieves superior performance, outperforming both open-source and proprietary models on VAD and VAR tasks.
Researcher Affiliation	Academia	1Shenzhen Campus of Sun Yat-sen University 2Harbin Institute of Technology, Shenzhen 3Laboratory for Artificial Intelligence in Design, The Hong Kong Polytechnic University 4Shenzhen Loop Area Institute EMAIL EMAIL EMAIL EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 Anomaly verification reward Input: Current video v, policy model πθ, generated completions O = {oi}G i=1. Output: Anomaly verification reward Rano . Algorithm 2 AVA-GRPO Input:Vad-Reasoning-RL dataset D = {(vj, Yj)}N j=1, initial policy model πθinit. Output: Updated policy model πθ.
Open Source Code	Yes	Codes and datasets will be released at https://github.com/wbfwonderful/Vad-R1.
Open Datasets	Yes	Based on the structured P2C-Co T, we construct Vad-Reasoning, a dedicated dataset for VAR. ... Codes and datasets will be released at https://github.com/wbfwonderful/Vad-R1.
Dataset Splits	Yes	In total, the proposed Vad-Reasoning dataset contains 8203 videos for training and 438 videos for test. As shown in Figure 2(c), the training set of Vad-Reasoning is split into two subsets: Vad-Reasoning-SFT which contains 1755 videos annotated with high-quality reasoning process, and Vad-Reasoning-RL which contains 6448 videos with video-level weak labels.
Hardware Specification	Yes	All experiments are conducted with 4 NVIDIA A100 (80GB) GPUs.
Software Dependencies	No	The paper mentions Qwen-Max [59], Qwen-VL-Max [61], and Qwen-2.5-VL-7B [61] as models used, but does not provide specific version numbers for ancillary software dependencies like programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	For the first stage, supervised fine-tuning is performed on the Vad-Reasoning-SFT dataset for four epochs. For the second stage, RL is performed with AVA-GRPO for one epoch... The learning rates for both stages are set to 1 10 6. The number of completions generated in a group is set to 4. The hyperparameter β in Equation 3 is set as 0.04.