Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer
Authors: Minghao Zhu, Zhengpu Wang, Mengxian Hu, Ronghao Dang, Xiao Lin, Xun Zhou, Chengju Liu, Qijun Chen
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate Mo TE achieves an optimal trade-off between zero-shot and close-set performance with one unified model. Thorough ablation studies show the scalability and effectiveness of our proposed method ( 4). |
| Researcher Affiliation | Academia | Minghao Zhu Zhengpu Wang Mengxian Hu Ronghao Dang Xiao Lin Xun Zhou Chengju Liu Qijun Chen Tongji University, Shanghai, China |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/ZMHH-H/Mo TE. |
| Open Datasets | Yes | We fine-tune our model using the Kinetics-400 [15] dataset as in previous works [28]... Zero-shot: Following previous works [28, 34], we evaluate zero-shot performance on UCF-101 [38], HMDB-51 [19], and Kinetics-600 [3]. |
| Dataset Splits | Yes | Kinetics-400 [15] is a large-scale dataset in the video domain. The dataset contains 240k training videos and 20k validation videos in 400 human action categories... UCF-101 [38]: There are three official splits of training data and validation data. HMDB-51 [19]: There are three official splits of the dataset, each with 3,570 training data and 1,530 validation data. |
| Hardware Specification | Yes | We conduct experiments with 3 NVIDIA Ge Force RTX 4090. |
| Software Dependencies | No | The paper lists 'Adam W' as the optimizer but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | In Table 6, we present the hyper-parameters set for optimization. ... Batch size 144 Optimizer Adam W Weight decay 0.2 Adam β1,β2 0.9, 0.999 Learning rate (Base) 5e-5 Learning rate (CLIP layers) 3e-6 Learning rate decay Cosine schedule Training epochs 30 (Vi T-B), 20 (Vi T-L) Linear warm-up epochs 5 |