Compressed Video Contrastive Learning
Authors: Yuqi Huo, Mingyu Ding, Haoyu Lu, Nanyi Fei, Zhiwu Lu, Ji-Rong Wen, Ping Luo
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on two downstream tasks show that our MVCGC yields new state-of-the-art while being significantly more efficient than its competitors. |
| Researcher Affiliation | Academia | 1Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 2Beijing Key Laboratory of Big Data Management and Analysis Methods 3School of Information, Renmin University of China, Beijing, China 4The University of Hong Kong, Pokfulam, Hong Kong, China |
| Pseudocode | Yes | Algorithm 1 Motion Vector based Cross Guidance Contrastive Learning (MVCGC) |
| Open Source Code | No | The paper does not explicitly state that the source code for the described methodology is publicly available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | In this paper, we use UCF101 [Soomro et al., 2012] and Kinetics-400 (K400) [Kay et al., 2017] for self-supervised pre-training. ...we benchmark downstream evaluation tasks on the first test set of UCF101, and the test split 1 of HMDB51 [Kuehne et al., 2011], a relatively small action dataset containing 6,766 videos with 51 categories. |
| Dataset Splits | Yes | K400 is a larger dataset consisting of 400 human action classes and has 230k/20k clips for training/validation, respectively. ... UCF101 contains 13,320 videos with 101 action classes and has three standard training/test splits. ... HMDB51 [Kuehne et al., 2011], a relatively small action dataset containing 6,766 videos with 51 categories. |
| Hardware Specification | Yes | All experiments are trained on 4 Ti Tan RTX GPUs, with a batch size of 32 samples per GPU. All methods are measured in exactly the same environment: Intel Xeon 5118 CPUs and a Titan RTX GPU. |
| Software Dependencies | No | The paper mentions 'pyav' and 'FFmpeg libraries' but does not specify any version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For the pre-training on UCF101, temperature τ = 0.07, momentum m = 0.999 and queue size 2048 are used, while queue size is set to 16384 on K400. When pre-training on UCF101, the initialization stage lasts 300 epochs for each stream, and we then continually train the cross guidance for another 200 epochs. On K400, we train 200 epochs for each stream in the initialization stage and 50 epochs for cross guidance contrastive learning. 100 and 500 epochs are used for linear and fully fine-tuning, respectively. We use the Adam optimizer with a 1e-4 learning rate and 1e-5 weight decay for pre-training and the SGD optimizer with a 1e-1 learning rate and 1e-3 weight decay for fine-tuning. The learning rate is decayed down by 1/10 twice when the validation loss plateaus. The hyper-parameter k in MVCGC is set as 5 according to the ablation study. All experiments are trained on 4 Ti Tan RTX GPUs, with a batch size of 32 samples per GPU. |