Compressed Video Prompt Tuning

Authors: Bing Li, Jiaxin Chen, Xiuguo Bao, Di Huang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations on HMDB-51, UCF-101 and Something Something v2 demonstrate that CVPT remarkably outperforms the state-of-the-art counterparts, delivering a much better balance between accuracy and efficiency.
Researcher Affiliation Academia Bing Li1,2 Jiaxin Chen2 Xiuguo Bao3 Di Huang1,2 1SKLSDE, Beihang University, Beijing, China 2IRIP Lab, SCSE, Beihang University, Beijing, China 3CNCERT/CC, Beijing, China
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. Figure 2 illustrates the framework but is not pseudocode.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes HMDB-51 and UCF-101 are two relatively small datasets, which contain 6766 videos from 51 action categories and 13,320 videos from 101 categories, respectively. Something-Something v2 (SSv2) is a large-scale motion-centric video dataset, including 168,913 videos for training and 24,777 videos for validation from 174 categories.
Dataset Splits Yes Something-Something v2 (SSv2) is a large-scale motion-centric video dataset, including 168,913 videos for training and 24,777 videos for validation from 174 categories.
Hardware Specification Yes We adopt the original model configurations and train the prompt parameters using the Adam W optimizer [35] on 12 NVIDIA V100 GPUs.
Software Dependencies No The paper mentions software components like 'Adam W optimizer' and 'Vi T' and 'Swin Transformer' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes The base learning rate, weight decay and batch size are set to 1 × 10−3, 1 × 10−4 and 240, respectively. Additionally, we adopt a warm-up strategy within the first 5 training epochs.