reproducibilityindex.ai

BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling

Authors: Yizhao Gao, Nanyi Fei, Haoyu Lu, Zhiwu Lu, Hao Jiang, Yijie Li, Zhao Cao

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive results show that our BMU-Mo Co remarkably outperforms recent competitors w.r.t. video-text retrieval performance and forgetting rate, even without using any extra data or dynamic networks.
Researcher Affiliation	Collaboration	Yizhao Gao1,2 Nanyi Fei1,2 Haoyu Lu1,2 Zhiwu Lu1,2, Hao Jiang3 Yijie Li3 Zhao Cao3 1Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 2Beijing Key Laboratory of Big Data Management and Analysis Methods 3Huawei Poisson Lab, Hangzhou, Zhejiang, China
Pseudocode	Yes	The full (pseudocode) algorithm of our BMU-Mo Co is presented in the supplementary material.
Open Source Code	No	3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No]
Open Datasets	Yes	Under our CVLM setting, models are supposed to be sequentially trained on five widely-used video-text datasets: VATEX [54], Activity Net [25], MSR-VTT [55], Di De Mo [20], and MSVD [10].
Dataset Splits	Yes	VATEX [54] is a large-scale open-domain dataset, which has 25,991 videos with 250K text descriptions for training, 3,000 videos for validation and 6,000 videos for testing.
Hardware Specification	Yes	The total training time on five tasks is around 20 hours with 8 Tesla V100 GPUs for each model.
Software Dependencies	No	The paper mentions using specific pre-trained models like 'Vi T-Base [13]/BERT-Base [12]' as encoders, but it does not provide specific version numbers for software libraries, frameworks, or dependencies (e.g., PyTorch 1.9, CUDA 11.1).
Experiment Setup	Yes	For the first epoch of each task under our CVLM setting, we set the learning rate to 5e-5 and decay it to 5e-6 afterwards. (3) We select the two momentum coefficients m = 0.99, ˆm = 0.99, and the temperature τ = 0.07. We set the batch size NB to 48 and the queue size NQ to 1,440. (4) The total training time on five tasks is around 20 hours with 8 Tesla V100 GPUs for each model.