Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Generalizing Single-Frame Supervision to Event-Level Understanding for Video Anomaly Detection
Authors: Junxi Chen, Liang Li, Yunbin Tu, Li Su, Zhe Xue, Qingming Huang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show SF-VAD achieves state-of-the-art detection results while offering a favorable trade-off between performance and annotation cost. |
| Researcher Affiliation | Academia | 1University of Chinese Academy of Sciences 2Key Laboratory of Intelligent Information Processing, ICT, CAS 3Beijing University of Posts and Telecommunications EMAIL EMAIL EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 Abnormal Event Mining |
| Open Source Code | Yes | The benchmarks and code are available at https://github.com/Junxi-Chen/SF-VAD. |
| Open Datasets | Yes | To validate its effectiveness, we construct three SF-VAD benchmarks by manually re-annotating the Shanghai Tech, UCF-Crime, and XD-Violence datasets in a practical procedure. |
| Dataset Splits | Yes | To validate the effectiveness of proposed paradigm, we construct three high-quality, human-annotated SF-VAD datasets based on the public benchmarks: Shanghai Tech [21], UCF-Crime [34], and XD-Violence [46]. For videos in the test set, the estimated annotation time is equivalent to the total duration of the test videos. |
| Hardware Specification | Yes | All experiments are conducted on a single NVIDIA RTX 3090 GPU using PyTorch. |
| Software Dependencies | No | All experiments are conducted on a single NVIDIA RTX 3090 GPU using PyTorch. |
| Experiment Setup | Yes | Hyperparameter. The hidden dimension Dh of transformer-based temporal modeling module is set to 128. The initial gate weight α of transformer-based temporal modeling module is set to 0.5. The window size w is set to 5, 9, and 9 for Shanghai Tech, UCF-Crime, and XD-Violence, respectively. The kernel size and stride of the one-dimensional convolutional layer ft are set to 3 and 1, respectively. In abnormal event mining algorithm, the threshold θ1 that filters the total variance of similarity is set to 0.1. The threshold θ2 that controls the prominence of similarity of key frames is set to 0.96. The threshold θ3 that controls the gap of abnormal events is set to 0.2. Training Details. ... The batch size is set to 128. The learning rate is 5e-4 initially and controlled by a cosine decay strategy. The parameters are optimized using Adam optimizer. The number of training epochs is set to 50. |