Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Do LVLMs Truly Understand Video Anomalies? Revealing Hallucination via Co-Occurrence Patterns
Authors: Menghao Zhang, Huazheng Wang, Pengfei Ren, Kangheng Lin, Qi Qi, Haifeng Sun, Zirui Zhuang, Lei Zhang, Jianxin Liao, Jingyu Wang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on six benchmark datasets demonstrate the effectiveness of VAD-DPO in enhancing both anomaly detection and reasoning performance, particularly in scene-dependent scenarios. |
| Researcher Affiliation | Collaboration | 1State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China 2China Unicom, Beijing, China |
| Pseudocode | No | The paper describes the proposed method, VAD-DPO, using textual descriptions and mathematical equations (e.g., LDPO and Ltotal in Section 4.1 and 4.2), but it does not include a clearly labeled pseudocode block or algorithm steps. |
| Open Source Code | Yes | We include part of the core code in the supplementary material. |
| Open Datasets | Yes | We evaluate our method on six real-world surveillance datasets commonly used in VAD: Shanghai Tech [19], UCF-Crime [28], XD-Violence [35], NWPU Campus [2], MSAD [50], and HIVAU-70K [44]. |
| Dataset Splits | No | The paper lists several benchmark datasets (Shanghai Tech, UCF-Crime, XD-Violence, NWPU Campus, MSAD, HIVAU-70K) in Section 5.1 and discusses the construction of 1,000 preference pairs for training VAD-DPO in Appendix A.3. However, it does not explicitly provide the specific training/testing/validation splits (e.g., percentages or sample counts) for the general evaluation of these benchmark datasets. |
| Hardware Specification | Yes | All training is performed on an internal cluster equipped with 2 NVIDIA A100 GPUs (80GB memory each), using distributed data parallelism via Py Torch s torch.distributed module. |
| Software Dependencies | No | The paper mentions using Py Torch's torch.distributed module and the Hugging Face implementation, but it does not specify concrete version numbers for these software components or other libraries like Python or CUDA. |
| Experiment Setup | Yes | We set the learning rate to 1e-4 and adopt a cosine learning rate scheduler with a warm-up ratio of 0.05. The default value of γ is set to 1 in Equation 6. All models are trained for a single epoch. |