Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection?

Authors: qingsong zhao, Yi Wang, Jilan Xu, Yinan He, Zifan Song, Limin Wang, Yu Qiao, Cairong Zhao

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on four action detection benchmarks demonstrate that OV-OAD outperforms other advanced zero-shot methods.
Researcher Affiliation Collaboration Qingsong Zhao 1,2 Yi Wang 2 Jilan Xu 4,2 Yinan He 2 Zifan Song1 Limin Wang3,2 Yu Qiao2 Cairong Zhao 1 1Tongji University 2Shanghai AI Laboratory 3Nanjing University 4Fudan University EMAIL EMAIL EMAIL
Pseudocode No The paper describes methods and algorithms but does not include a formal pseudocode or algorithm block.
Open Source Code Yes The code will be available for download at https://github.com/Open GVLab/OV-OAD.
Open Datasets Yes We use the filtered Intern Vid-10M-FLT (aka, Intern Vid [42]) and the Activity Net v1.3 (aka, ANet [3]) datasets for training, which are originally collected 4M and 14950 untrimmed video-caption pairs from the web, respectively.
Dataset Splits Yes For base-to-novel generalization, we integrate three well-known Transformer-based online action detection models including Oad TR [41], LSTR [46] and MAT [38] with a text encoder using the image-text contrastive loss. To ensure statistical significance, we adopted the random sampling setup and dataset partitioning method proposed by [21]. For our experiments, we employed two evaluation settings on the THUMOS 14 dataset, i.e., training on 75% of the action categories and testing on the remaining 25%, and training on 50% of the categories while testing on the remaining 50%.
Hardware Specification Yes Our run experiments on NVIDIA V100 8 using Pytorch 1.11.0.
Software Dependencies Yes Our run experiments on NVIDIA V100 8 using Pytorch 1.11.0.
Experiment Setup Yes We train our OV-OAD for 30 epochs with 2 warm-up epochs using the Adam optimizer with weight decay 5e 2. It uses a cosine schedule with a batch size of 256, and the initial learning rate is 1.6e 4.