Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles

Authors: Dahun Kim, Donghyeon Cho, In So Kweon8545-8552

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, we demonstrate that our learned 3D representation is well transferred to action recognition tasks, and outperforms state-of-the-art 2D CNN-based competitors on UCF101 and HMDB51 datasets.
Researcher Affiliation Academia Dahun Kim, Donghyeon Cho, In So Kweon Dept. of Electrical Engineering, KAIST, Daejeon, Korea mcahny@kaist.ac.kr, cdh12242@gmail.com, iskweon77@kaist.ac.kr
Pseudocode No The paper does not contain any sections explicitly labeled as 'Pseudocode' or 'Algorithm', nor are there structured code-like blocks.
Open Source Code No All the pre-trained models and the source codes will be available soon.
Open Datasets Yes We conduct video recognition experiments on two benchmark action recognition datasets, namely UCF101 (Soomro, Zamir, and Shah 2012) and HMDB51 (Kuehne et al. 2011).
Dataset Splits Yes All the experiments follow the training/test splits of UCF101 and HMDB51, and we mostly report the average classification accuracy over the three splits for UCF101, as done in (Hara, Kataoka, and Satoh 2018).
Hardware Specification Yes We use stochastic gradient descent with a momentum of 0.9 on two GTX-1080Ti GPUs.
Software Dependencies No The paper mentions models like '3D Res Net' and datasets like 'Kinetics', but does not specify software components (e.g., programming languages, libraries, frameworks) with version numbers used for the experiments.
Experiment Setup Yes We set the mini-batch size as 128 and the initial learning rate as 0.01. We start from a learning rate of 0.05, and assign a weight decay of 5e-4.