MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer
Authors: Minghao Zhu, Zhengpu Wang, Mengxian Hu, Ronghao Dang, Xiao Lin, Xun Zhou, Chengju Liu, Qijun Chen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate Mo TE achieves an optimal trade-off between zero-shot and close-set performance with one unified model. Thorough ablation studies show the scalability and effectiveness of our proposed method ( 4). |
| Researcher Affiliation | Academia | Minghao Zhu Zhengpu Wang Mengxian Hu Ronghao Dang Xiao Lin Xun Zhou Chengju Liu Qijun Chen Tongji University, Shanghai, China |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/ZMHH-H/Mo TE. |
| Open Datasets | Yes | We fine-tune our model using the Kinetics-400 [15] dataset as in previous works [28]... Zero-shot: Following previous works [28, 34], we evaluate zero-shot performance on UCF-101 [38], HMDB-51 [19], and Kinetics-600 [3]. |
| Dataset Splits | Yes | Kinetics-400 [15] is a large-scale dataset in the video domain. The dataset contains 240k training videos and 20k validation videos in 400 human action categories... UCF-101 [38]: There are three official splits of training data and validation data. HMDB-51 [19]: There are three official splits of the dataset, each with 3,570 training data and 1,530 validation data. |
| Hardware Specification | Yes | We conduct experiments with 3 NVIDIA Ge Force RTX 4090. |
| Software Dependencies | No | The paper lists 'Adam W' as the optimizer but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | In Table 6, we present the hyper-parameters set for optimization. ... Batch size 144 Optimizer Adam W Weight decay 0.2 Adam β1,β2 0.9, 0.999 Learning rate (Base) 5e-5 Learning rate (CLIP layers) 3e-6 Learning rate decay Cosine schedule Training epochs 30 (Vi T-B), 20 (Vi T-L) Linear warm-up epochs 5 |