Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection?
Authors: qingsong zhao, Yi Wang, Jilan Xu, Yinan He, Zifan Song, Limin Wang, Yu Qiao, Cairong Zhao
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on four action detection benchmarks demonstrate that OV-OAD outperforms other advanced zero-shot methods. |
| Researcher Affiliation | Collaboration | Qingsong Zhao 1,2 Yi Wang 2 Jilan Xu 4,2 Yinan He 2 Zifan Song1 Limin Wang3,2 Yu Qiao2 Cairong Zhao 1 1Tongji University 2Shanghai AI Laboratory 3Nanjing University 4Fudan University EMAIL EMAIL EMAIL |
| Pseudocode | No | The paper describes methods and algorithms but does not include a formal pseudocode or algorithm block. |
| Open Source Code | Yes | The code will be available for download at https://github.com/Open GVLab/OV-OAD. |
| Open Datasets | Yes | We use the filtered Intern Vid-10M-FLT (aka, Intern Vid [42]) and the Activity Net v1.3 (aka, ANet [3]) datasets for training, which are originally collected 4M and 14950 untrimmed video-caption pairs from the web, respectively. |
| Dataset Splits | Yes | For base-to-novel generalization, we integrate three well-known Transformer-based online action detection models including Oad TR [41], LSTR [46] and MAT [38] with a text encoder using the image-text contrastive loss. To ensure statistical significance, we adopted the random sampling setup and dataset partitioning method proposed by [21]. For our experiments, we employed two evaluation settings on the THUMOS 14 dataset, i.e., training on 75% of the action categories and testing on the remaining 25%, and training on 50% of the categories while testing on the remaining 50%. |
| Hardware Specification | Yes | Our run experiments on NVIDIA V100 8 using Pytorch 1.11.0. |
| Software Dependencies | Yes | Our run experiments on NVIDIA V100 8 using Pytorch 1.11.0. |
| Experiment Setup | Yes | We train our OV-OAD for 30 epochs with 2 warm-up epochs using the Adam optimizer with weight decay 5e 2. It uses a cosine schedule with a batch size of 256, and the initial learning rate is 1.6e 4. |