Compressed Video Contrastive Learning

Authors: Yuqi Huo, Mingyu Ding, Haoyu Lu, Nanyi Fei, Zhiwu Lu, Ji-Rong Wen, Ping Luo

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on two downstream tasks show that our MVCGC yields new state-of-the-art while being significantly more efficient than its competitors.
Researcher Affiliation Academia 1Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 2Beijing Key Laboratory of Big Data Management and Analysis Methods 3School of Information, Renmin University of China, Beijing, China 4The University of Hong Kong, Pokfulam, Hong Kong, China
Pseudocode Yes Algorithm 1 Motion Vector based Cross Guidance Contrastive Learning (MVCGC)
Open Source Code No The paper does not explicitly state that the source code for the described methodology is publicly available, nor does it provide a link to a code repository.
Open Datasets Yes In this paper, we use UCF101 [Soomro et al., 2012] and Kinetics-400 (K400) [Kay et al., 2017] for self-supervised pre-training. ...we benchmark downstream evaluation tasks on the first test set of UCF101, and the test split 1 of HMDB51 [Kuehne et al., 2011], a relatively small action dataset containing 6,766 videos with 51 categories.
Dataset Splits Yes K400 is a larger dataset consisting of 400 human action classes and has 230k/20k clips for training/validation, respectively. ... UCF101 contains 13,320 videos with 101 action classes and has three standard training/test splits. ... HMDB51 [Kuehne et al., 2011], a relatively small action dataset containing 6,766 videos with 51 categories.
Hardware Specification Yes All experiments are trained on 4 Ti Tan RTX GPUs, with a batch size of 32 samples per GPU. All methods are measured in exactly the same environment: Intel Xeon 5118 CPUs and a Titan RTX GPU.
Software Dependencies No The paper mentions 'pyav' and 'FFmpeg libraries' but does not specify any version numbers for these or other software dependencies.
Experiment Setup Yes For the pre-training on UCF101, temperature τ = 0.07, momentum m = 0.999 and queue size 2048 are used, while queue size is set to 16384 on K400. When pre-training on UCF101, the initialization stage lasts 300 epochs for each stream, and we then continually train the cross guidance for another 200 epochs. On K400, we train 200 epochs for each stream in the initialization stage and 50 epochs for cross guidance contrastive learning. 100 and 500 epochs are used for linear and fully fine-tuning, respectively. We use the Adam optimizer with a 1e-4 learning rate and 1e-5 weight decay for pre-training and the SGD optimizer with a 1e-1 learning rate and 1e-3 weight decay for fine-tuning. The learning rate is decayed down by 1/10 twice when the validation loss plateaus. The hyper-parameter k in MVCGC is set as 5 according to the ablation study. All experiments are trained on 4 Ti Tan RTX GPUs, with a batch size of 32 samples per GPU.