Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Generalizing Single-Frame Supervision to Event-Level Understanding for Video Anomaly Detection

Authors: Junxi Chen, Liang Li, Yunbin Tu, Li Su, Zhe Xue, Qingming Huang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show SF-VAD achieves state-of-the-art detection results while offering a favorable trade-off between performance and annotation cost.
Researcher Affiliation Academia 1University of Chinese Academy of Sciences 2Key Laboratory of Intelligent Information Processing, ICT, CAS 3Beijing University of Posts and Telecommunications EMAIL EMAIL EMAIL EMAIL
Pseudocode Yes Algorithm 1 Abnormal Event Mining
Open Source Code Yes The benchmarks and code are available at https://github.com/Junxi-Chen/SF-VAD.
Open Datasets Yes To validate its effectiveness, we construct three SF-VAD benchmarks by manually re-annotating the Shanghai Tech, UCF-Crime, and XD-Violence datasets in a practical procedure.
Dataset Splits Yes To validate the effectiveness of proposed paradigm, we construct three high-quality, human-annotated SF-VAD datasets based on the public benchmarks: Shanghai Tech [21], UCF-Crime [34], and XD-Violence [46]. For videos in the test set, the estimated annotation time is equivalent to the total duration of the test videos.
Hardware Specification Yes All experiments are conducted on a single NVIDIA RTX 3090 GPU using PyTorch.
Software Dependencies No All experiments are conducted on a single NVIDIA RTX 3090 GPU using PyTorch.
Experiment Setup Yes Hyperparameter. The hidden dimension Dh of transformer-based temporal modeling module is set to 128. The initial gate weight α of transformer-based temporal modeling module is set to 0.5. The window size w is set to 5, 9, and 9 for Shanghai Tech, UCF-Crime, and XD-Violence, respectively. The kernel size and stride of the one-dimensional convolutional layer ft are set to 3 and 1, respectively. In abnormal event mining algorithm, the threshold θ1 that filters the total variance of similarity is set to 0.1. The threshold θ2 that controls the prominence of similarity of key frames is set to 0.96. The threshold θ3 that controls the gap of abnormal events is set to 0.2. Training Details. ... The batch size is set to 128. The learning rate is 5e-4 initially and controlled by a cosine decay strategy. The parameters are optimized using Adam optimizer. The number of training epochs is set to 50.