Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

VADTree: Explainable Training-Free Video Anomaly Detection via Hierarchical Granularity-Aware Tree

Authors: Wenlong Li, Yifei Xu, Yuan Rao, Zhenhua Wang, Shuiguang Deng

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on three challenging datasets demonstrate that VADTree achieves state-of-the-art performance in training-free settings while drastically reducing the number of sampled video segments. The code will be available at https: //github.com/wenlongli10/VADTree. 4 Experiments We validate the performance of VADTree on three datasets against state-of-the-art VAD methods trained with different types of supervision, as well as other training-free baselines. To verify the necessity of each core module, we conduct systematic ablation studies to demonstrate the rationality and effectiveness of VADTree s proposed components. In the following, we first describe the experimental setup in terms of datasets and performance metrics. We then present and discuss the results in Section 4.1, followed by the ablation studies in Section 4.2, and conclude with qualitative experiments in Section 4.3. For more experimental analysis and qualitative results, please refer to the Appendix C.
Researcher Affiliation Collaboration Wenlong Li1 Yifei Xu1,4 Yuan Rao1 Zhenhua Wang2 Shuiguang Deng3 1School of Software, Xi an Jiaotong University 2China Railway Xi an Group 3College of Computer Science and Technology, Zhejiang University 4 Xi an Jiaotong University Suzhou Institute EMAIL EMAIL
Pseudocode Yes A Hierarchical Granularity-aware Tree A.1 Tree Init: Granularity-Aware Binary Tree Construction Algorithm 1 Tree Init: Granularity-Aware Binary Tree Construction Algorithm (Section 3.1)
Open Source Code Yes Extensive experiments on three challenging datasets demonstrate that VADTree achieves state-of-the-art performance in training-free settings while drastically reducing the number of sampled video segments. The code will be available at https: //github.com/wenlongli10/VADTree.
Open Datasets Yes We evaluate VADTree on three benchmark datasets: UCF-Crime [34], XD-Violence [43], and MSAD [67]. Our empirical results demonstrate that VADTree outperforms unsupervised, one-class, and training-free VAD methods.
Dataset Splits Yes Datasets We evaluate our method using three commonly used VAD datasets featuring real-world surveillance scenarios, i.e., UCF-Crime [34], XD-Violence [43], and MSAD [67]. UCF-Crime is a large-scale dataset comprising 1900 long untrimmed real-world surveillance videos with 13 types of anomalies. The training set consists of 800 normal and 810 anomalous videos, while the test set includes 150 normal and 140 anomalous videos. XD-Violence is another large-scale dataset for violence detection, comprising 4754 untrimmed videos with audio signals and weak labels that are collected from both movies and You Tube. XD-Violence captures 6 categories of anomalies and it is divided into a training set of 3954 videos and a test set of 800 videos. We also evaluate VADTree on MSAD dataset, which provides a greater diversity of real-world scenarios than existing benchmarks.
Hardware Specification Yes We display the total inference time (GPU hours) of LAVAD and VADTree on two NVIDIA GeForce RTX 3090 GPUs in Table 16.
Software Dependencies No The paper references specific models like LLaVA-Video-7B-Qwen2 and Deep Seek-R1-Distill-Qwen-14B, and mentions using Efficient GEBD [64] and Image Bind [12] as the video encoder. While these are specific tools and models, the paper does not provide version numbers for general software dependencies like Python, PyTorch, or CUDA, which are typically required for full reproducibility of the software environment.
Experiment Setup Yes Implementation Details We use Efficient GEBD [64] as the model f GEBD for generic event boundary knowledge acquisition, and the overlapping sampling window length lraw follows the 10s window of Kinetics-GEBD [31]. The video description model f VLM and the anomaly reasoning model f LLM use LLa VA-Video-7B-Qwen2 [63] and Deep Seek-R1-Distill-Qwen-14B [8] respectively. In all experiments, the VLM input is configured to a maximum of 64 frames, with the LLM having the thinking mode turned on by default. C.1 More Experimental Details Based on the experimental details described in Section 4, the Ξ³min = 0.4 and K-Means clustering algorithm are used to generate the HGTree for inference. In the inter-cluster node refinement process, we implemented a top-K control for the final weighted neighborhood node numbers. Additionally, this process also includes the temperature parameter Ο„ of softmax. In the Inter-cluster Node Correlation, the hyperparameter Ξ² affects the weight of coarse and fine clusters in the final anomaly score.