Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection?
Authors: qingsong zhao, Yi Wang, Jilan Xu, Yinan He, Zifan Song, Limin Wang, Yu Qiao, Cairong Zhao
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on four action detection benchmarks demonstrate that OV-OAD outperforms other advanced zero-shot methods. |
| Researcher Affiliation | Collaboration | Qingsong Zhao 1,2 Yi Wang 2 Jilan Xu 4,2 Yinan He 2 Zifan Song1 Limin Wang3,2 Yu Qiao2 Cairong Zhao 1 1Tongji University 2Shanghai AI Laboratory 3Nanjing University 4Fudan University {qingsongzhao, zhaocairong}@tongji.edu.cn {wangyi, heyinan, qiaoyu}@pjlab.org.cn lmwang.nju@gmail.com |
| Pseudocode | No | The paper describes methods and algorithms but does not include a formal pseudocode or algorithm block. |
| Open Source Code | Yes | The code will be available for download at https://github.com/Open GVLab/OV-OAD. |
| Open Datasets | Yes | We use the filtered Intern Vid-10M-FLT (aka, Intern Vid [42]) and the Activity Net v1.3 (aka, ANet [3]) datasets for training, which are originally collected 4M and 14950 untrimmed video-caption pairs from the web, respectively. |
| Dataset Splits | Yes | For base-to-novel generalization, we integrate three well-known Transformer-based online action detection models including Oad TR [41], LSTR [46] and MAT [38] with a text encoder using the image-text contrastive loss. To ensure statistical significance, we adopted the random sampling setup and dataset partitioning method proposed by [21]. For our experiments, we employed two evaluation settings on the THUMOS 14 dataset, i.e., training on 75% of the action categories and testing on the remaining 25%, and training on 50% of the categories while testing on the remaining 50%. |
| Hardware Specification | Yes | Our run experiments on NVIDIA V100 8 using Pytorch 1.11.0. |
| Software Dependencies | Yes | Our run experiments on NVIDIA V100 8 using Pytorch 1.11.0. |
| Experiment Setup | Yes | We train our OV-OAD for 30 epochs with 2 warm-up epochs using the Adam optimizer with weight decay 5e 2. It uses a cosine schedule with a batch size of 256, and the initial learning rate is 1.6e 4. |