reproducibilityindex.ai

Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection?

Authors: qingsong zhao, Yi Wang, Jilan Xu, Yinan He, Zifan Song, Limin Wang, Yu Qiao, Cairong Zhao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on four action detection benchmarks demonstrate that OV-OAD outperforms other advanced zero-shot methods.
Researcher Affiliation	Collaboration	Qingsong Zhao 1,2 Yi Wang 2 Jilan Xu 4,2 Yinan He 2 Zifan Song1 Limin Wang3,2 Yu Qiao2 Cairong Zhao 1 1Tongji University 2Shanghai AI Laboratory 3Nanjing University 4Fudan University {qingsongzhao, zhaocairong}@tongji.edu.cn {wangyi, heyinan, qiaoyu}@pjlab.org.cn lmwang.nju@gmail.com
Pseudocode	No	The paper describes methods and algorithms but does not include a formal pseudocode or algorithm block.
Open Source Code	Yes	The code will be available for download at https://github.com/Open GVLab/OV-OAD.
Open Datasets	Yes	We use the filtered Intern Vid-10M-FLT (aka, Intern Vid [42]) and the Activity Net v1.3 (aka, ANet [3]) datasets for training, which are originally collected 4M and 14950 untrimmed video-caption pairs from the web, respectively.
Dataset Splits	Yes	For base-to-novel generalization, we integrate three well-known Transformer-based online action detection models including Oad TR [41], LSTR [46] and MAT [38] with a text encoder using the image-text contrastive loss. To ensure statistical significance, we adopted the random sampling setup and dataset partitioning method proposed by [21]. For our experiments, we employed two evaluation settings on the THUMOS 14 dataset, i.e., training on 75% of the action categories and testing on the remaining 25%, and training on 50% of the categories while testing on the remaining 50%.
Hardware Specification	Yes	Our run experiments on NVIDIA V100 8 using Pytorch 1.11.0.
Software Dependencies	Yes	Our run experiments on NVIDIA V100 8 using Pytorch 1.11.0.
Experiment Setup	Yes	We train our OV-OAD for 30 epochs with 2 warm-up epochs using the Adam optimizer with weight decay 5e 2. It uses a cosine schedule with a batch size of 256, and the initial learning rate is 1.6e 4.