reproducibilityindex.ai

VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection

Authors: Peng Wu, Xuerong Zhou, Guansong Pang, Lingru Zhou, Qingsen Yan, Peng Wang, Yanning Zhang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on two commonly-used benchmarks, demonstrating that Vad CLIP achieves the best performance on both coarse-grained and fine-grained WSVAD, surpassing the state-of-the-art methods by a large margin. Specifically, Vad CLIP achieves 84.51% AP and 88.02% AUC on XDViolence and UCF-Crime, respectively. Code and features are released at https://github.com/nwpu-zxr/Vad CLIP. Extensive ablations are carried out on XD-Violence dataset.
Researcher Affiliation	Academia	Peng Wu1, Xuerong Zhou1, Guansong Pang2*, Lingru Zhou1, Qingsen Yan1, Peng Wang1 , Yanning Zhang1 1ASGO, School of Computer Science, Northwestern Polytechnical University, China 2School of Computing and Information Systems, Singapore Management University, Singapore {xdwupeng, zxr2333}@gmail.com, gspang@smu.edu.sg, {lingruzhou, yqs}@mail.nwpu.edu.cn, {peng.wang, ynzhang}@nwpu.edu.cn
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	Code and features are released at https://github.com/nwpu-zxr/Vad CLIP.
Open Datasets	Yes	We conduct experiments on two popular WSVAD datasets, i.e., UCF-Crime and XD-Violence. Notably, training videos only have video-level labels on both datasets.
Dataset Splits	No	The paper mentions "training videos only have video-level labels" and later "on the abnormal videos in the test set", but does not provide specific details on training, validation, and test splits (e.g., percentages, counts, or explicit predefined splits).
Hardware Specification	Yes	Vad CLIP is trained on a single NVIDIA RTX 3090 GPU using PyTorch.
Software Dependencies	No	The paper mentions "PyTorch" but does not specify its version number or versions for any other software libraries or solvers.
Experiment Setup	Yes	For hyper-parameters, we set σ in Eq.3 as 1, τ in Eq.8 as 0.07, and the context length l as 20. For window length in LGT-Adapter, we set it as 64 and 8 on XD-Violence and UCF-Crime, respectively. For λ in Eq.10, we set it as 1 10 4 and 1 10 1 on XD-Violence and UCF-Crime, respectively. For model training, Vad CLIP is trained on a single NVIDIA RTX 3090 GPU using Py Torch. We use Adam W as the optimizer with batch size of 64. On XDViolence, the learning rate and total epoch are set as 2 10 5 and 20, respectively, and on UCF-Crime, the learning rate and total epoch are set as 1 10 5 and 10, respectively.