Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles
Authors: Dahun Kim, Donghyeon Cho, In So Kweon8545-8552
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we demonstrate that our learned 3D representation is well transferred to action recognition tasks, and outperforms state-of-the-art 2D CNN-based competitors on UCF101 and HMDB51 datasets. |
| Researcher Affiliation | Academia | Dahun Kim, Donghyeon Cho, In So Kweon Dept. of Electrical Engineering, KAIST, Daejeon, Korea mcahny@kaist.ac.kr, cdh12242@gmail.com, iskweon77@kaist.ac.kr |
| Pseudocode | No | The paper does not contain any sections explicitly labeled as 'Pseudocode' or 'Algorithm', nor are there structured code-like blocks. |
| Open Source Code | No | All the pre-trained models and the source codes will be available soon. |
| Open Datasets | Yes | We conduct video recognition experiments on two benchmark action recognition datasets, namely UCF101 (Soomro, Zamir, and Shah 2012) and HMDB51 (Kuehne et al. 2011). |
| Dataset Splits | Yes | All the experiments follow the training/test splits of UCF101 and HMDB51, and we mostly report the average classification accuracy over the three splits for UCF101, as done in (Hara, Kataoka, and Satoh 2018). |
| Hardware Specification | Yes | We use stochastic gradient descent with a momentum of 0.9 on two GTX-1080Ti GPUs. |
| Software Dependencies | No | The paper mentions models like '3D Res Net' and datasets like 'Kinetics', but does not specify software components (e.g., programming languages, libraries, frameworks) with version numbers used for the experiments. |
| Experiment Setup | Yes | We set the mini-batch size as 128 and the initial learning rate as 0.01. We start from a learning rate of 0.05, and assign a weight decay of 5e-4. |