reproducibilityindex.ai

MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer

Authors: Minghao Zhu, Zhengpu Wang, Mengxian Hu, Ronghao Dang, Xiao Lin, Xun Zhou, Chengju Liu, Qijun Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate Mo TE achieves an optimal trade-off between zero-shot and close-set performance with one unified model. Thorough ablation studies show the scalability and effectiveness of our proposed method ( 4).
Researcher Affiliation	Academia	Minghao Zhu Zhengpu Wang Mengxian Hu Ronghao Dang Xiao Lin Xun Zhou Chengju Liu Qijun Chen Tongji University, Shanghai, China
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/ZMHH-H/Mo TE.
Open Datasets	Yes	We fine-tune our model using the Kinetics-400 [15] dataset as in previous works [28]... Zero-shot: Following previous works [28, 34], we evaluate zero-shot performance on UCF-101 [38], HMDB-51 [19], and Kinetics-600 [3].
Dataset Splits	Yes	Kinetics-400 [15] is a large-scale dataset in the video domain. The dataset contains 240k training videos and 20k validation videos in 400 human action categories... UCF-101 [38]: There are three official splits of training data and validation data. HMDB-51 [19]: There are three official splits of the dataset, each with 3,570 training data and 1,530 validation data.
Hardware Specification	Yes	We conduct experiments with 3 NVIDIA Ge Force RTX 4090.
Software Dependencies	No	The paper lists 'Adam W' as the optimizer but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	In Table 6, we present the hyper-parameters set for optimization. ... Batch size 144 Optimizer Adam W Weight decay 0.2 Adam β1,β2 0.9, 0.999 Learning rate (Base) 5e-5 Learning rate (CLIP layers) 3e-6 Learning rate decay Cosine schedule Training epochs 30 (Vi T-B), 20 (Vi T-L) Linear warm-up epochs 5