Compressed Video Prompt Tuning
Authors: Bing Li, Jiaxin Chen, Xiuguo Bao, Di Huang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations on HMDB-51, UCF-101 and Something Something v2 demonstrate that CVPT remarkably outperforms the state-of-the-art counterparts, delivering a much better balance between accuracy and efficiency. |
| Researcher Affiliation | Academia | Bing Li1,2 Jiaxin Chen2 Xiuguo Bao3 Di Huang1,2 1SKLSDE, Beihang University, Beijing, China 2IRIP Lab, SCSE, Beihang University, Beijing, China 3CNCERT/CC, Beijing, China |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. Figure 2 illustrates the framework but is not pseudocode. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | HMDB-51 and UCF-101 are two relatively small datasets, which contain 6766 videos from 51 action categories and 13,320 videos from 101 categories, respectively. Something-Something v2 (SSv2) is a large-scale motion-centric video dataset, including 168,913 videos for training and 24,777 videos for validation from 174 categories. |
| Dataset Splits | Yes | Something-Something v2 (SSv2) is a large-scale motion-centric video dataset, including 168,913 videos for training and 24,777 videos for validation from 174 categories. |
| Hardware Specification | Yes | We adopt the original model configurations and train the prompt parameters using the Adam W optimizer [35] on 12 NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions software components like 'Adam W optimizer' and 'Vi T' and 'Swin Transformer' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | The base learning rate, weight decay and batch size are set to 1 × 10−3, 1 × 10−4 and 240, respectively. Additionally, we adopt a warm-up strategy within the first 5 training epochs. |