Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Do LVLMs Truly Understand Video Anomalies? Revealing Hallucination via Co-Occurrence Patterns

Authors: Menghao Zhang, Huazheng Wang, Pengfei Ren, Kangheng Lin, Qi Qi, Haifeng Sun, Zirui Zhuang, Lei Zhang, Jianxin Liao, Jingyu Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on six benchmark datasets demonstrate the effectiveness of VAD-DPO in enhancing both anomaly detection and reasoning performance, particularly in scene-dependent scenarios.
Researcher Affiliation	Collaboration	1State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China 2China Unicom, Beijing, China
Pseudocode	No	The paper describes the proposed method, VAD-DPO, using textual descriptions and mathematical equations (e.g., LDPO and Ltotal in Section 4.1 and 4.2), but it does not include a clearly labeled pseudocode block or algorithm steps.
Open Source Code	Yes	We include part of the core code in the supplementary material.
Open Datasets	Yes	We evaluate our method on six real-world surveillance datasets commonly used in VAD: Shanghai Tech [19], UCF-Crime [28], XD-Violence [35], NWPU Campus [2], MSAD [50], and HIVAU-70K [44].
Dataset Splits	No	The paper lists several benchmark datasets (Shanghai Tech, UCF-Crime, XD-Violence, NWPU Campus, MSAD, HIVAU-70K) in Section 5.1 and discusses the construction of 1,000 preference pairs for training VAD-DPO in Appendix A.3. However, it does not explicitly provide the specific training/testing/validation splits (e.g., percentages or sample counts) for the general evaluation of these benchmark datasets.
Hardware Specification	Yes	All training is performed on an internal cluster equipped with 2 NVIDIA A100 GPUs (80GB memory each), using distributed data parallelism via Py Torch s torch.distributed module.
Software Dependencies	No	The paper mentions using Py Torch's torch.distributed module and the Hugging Face implementation, but it does not specify concrete version numbers for these software components or other libraries like Python or CUDA.
Experiment Setup	Yes	We set the learning rate to 1e-4 and adopt a cosine learning rate scheduler with a warm-up ratio of 0.05. The default value of γ is set to 1 in Equation 6. All models are trained for a single epoch.