VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection
Authors: Peng Wu, Xuerong Zhou, Guansong Pang, Lingru Zhou, Qingsen Yan, Peng Wang, Yanning Zhang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on two commonly-used benchmarks, demonstrating that Vad CLIP achieves the best performance on both coarse-grained and fine-grained WSVAD, surpassing the state-of-the-art methods by a large margin. Specifically, Vad CLIP achieves 84.51% AP and 88.02% AUC on XDViolence and UCF-Crime, respectively. Code and features are released at https://github.com/nwpu-zxr/Vad CLIP. Extensive ablations are carried out on XD-Violence dataset. |
| Researcher Affiliation | Academia | Peng Wu1, Xuerong Zhou1, Guansong Pang2*, Lingru Zhou1, Qingsen Yan1, Peng Wang1 , Yanning Zhang1 1ASGO, School of Computer Science, Northwestern Polytechnical University, China 2School of Computing and Information Systems, Singapore Management University, Singapore {xdwupeng, zxr2333}@gmail.com, gspang@smu.edu.sg, {lingruzhou, yqs}@mail.nwpu.edu.cn, {peng.wang, ynzhang}@nwpu.edu.cn |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Code and features are released at https://github.com/nwpu-zxr/Vad CLIP. |
| Open Datasets | Yes | We conduct experiments on two popular WSVAD datasets, i.e., UCF-Crime and XD-Violence. Notably, training videos only have video-level labels on both datasets. |
| Dataset Splits | No | The paper mentions "training videos only have video-level labels" and later "on the abnormal videos in the test set", but does not provide specific details on training, validation, and test splits (e.g., percentages, counts, or explicit predefined splits). |
| Hardware Specification | Yes | Vad CLIP is trained on a single NVIDIA RTX 3090 GPU using PyTorch. |
| Software Dependencies | No | The paper mentions "PyTorch" but does not specify its version number or versions for any other software libraries or solvers. |
| Experiment Setup | Yes | For hyper-parameters, we set σ in Eq.3 as 1, τ in Eq.8 as 0.07, and the context length l as 20. For window length in LGT-Adapter, we set it as 64 and 8 on XD-Violence and UCF-Crime, respectively. For λ in Eq.10, we set it as 1 10 4 and 1 10 1 on XD-Violence and UCF-Crime, respectively. For model training, Vad CLIP is trained on a single NVIDIA RTX 3090 GPU using Py Torch. We use Adam W as the optimizer with batch size of 64. On XDViolence, the learning rate and total epoch are set as 2 10 5 and 20, respectively, and on UCF-Crime, the learning rate and total epoch are set as 1 10 5 and 10, respectively. |