Self-Supervised Learning of Compressed Video Representations

Authors: Youngjae Yu, Sangho Lee, Gunhee Kim, Yale Song

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that our approach achieves competitive performance on compressed video recognition both in supervised and self-supervised regimes. ... 3 EXPERIMENTS ... Table 1 summarizes the results. ... Table 3 summarizes the results.
Researcher Affiliation Collaboration Youngjae Yu , Sangho Lee , Gunhee Kim Seoul National University {yj.yu,sangho.lee}@vision.snu.ac.kr, gunhee@snu.ac.kr Yale Song Microsoft Research yalesong@microsoft.com
Pseudocode Yes Algorithm 1: Self-supervision label for Pyramidal Motion Statistics Prediction
Open Source Code No No explicit statement about the authors providing open-source code for their methodology or a link to a code repository was found.
Open Datasets Yes We pretrain our model on Kinetics-400 (Kay et al., 2017). For evaluation, we finetune the pretrained model for action recognition using UCF-101 (Soomro et al., 2012) and HMDB-51 (Kuehne et al., 2011).
Dataset Splits Yes We use the standard training and evaluation protocols for both UCF-101 (Soomro et al., 2012) and HMDB-51 (Kuehne et al., 2011).
Hardware Specification Yes We use 4 NVIDIA Tesla V100 GPUs and use a batch size of 100. ... Table 2 shows per-frame runtime speed (ms) and GFLOPs measured on an NVIDIA Tesla P100 GPU with Intel E5-2698 v4 CPUs
Software Dependencies No No specific software versions (e.g., Python, PyTorch, CUDA versions) were mentioned, only general software components like '3D Res Net' and 'SGD'.
Experiment Setup Yes We pretrain our model end-to-end from scratch for 20 epochs, including the initial warm-up period of 5 epochs. For downstream scenarios, we finetune our model for 500 epochs for UCF-101 and for 300 epochs for HMDB-51, including the warm-up period of 30 epochs. For both the pretraining and finetuning stages, we use SGD with momentum 0.9, weight decay 10 4, and half-period cosine learning rate schedule. We use 4 NVIDIA Tesla V100 GPUs and use a batch size of 100.