Self-Supervised Spatiotemporal Representation Learning by Exploiting Video Continuity
Authors: Hanwen Liang, Niamul Quader, Zhixiang Chi, Lizhe Chen, Peng Dai, Juwei Lu, Yang Wang1564-1573
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We carry out extensive experiments and demonstrate the superiority of CPNet in learning more effective video representations. CPNet outperforms prior arts on multiple downstream tasks including action recognition, video retrieval and action localization. Also, the discontinuity localization task is shown to be the most effective pretext task in CPNet, and incorporating it into other typical self-supervised learning methods can bring significant performance gains. |
| Researcher Affiliation | Collaboration | Hanwen Liang1*, Niamul Quader 1, Zhixiang Chi 1, Lizhe Chen1, Peng Dai1*, Juwei Lu1 and Yang Wang1,2 1 Huawei Noah s Ark Lab 2 University of Manitoba, Canada |
| Pseudocode | No | The paper describes the model architecture and training process using text and mathematical equations, but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We use the following benchmark datasets to evaluate the efficacy of CPNet, i.e. UCF101 (Soomro, Zamir, and Shah 2012), HMDB51 (Kuehne et al. 2011), Kinetics400 (abbr. K400) (Kay et al. 2017), Diving48 (Li, Li, and Vasconcelos 2018) and Activity Net-v1.3 (Caba Heilbron et al. 2015). |
| Dataset Splits | Yes | For UCF101 and HMDB51, We use the training/testing split 1 for fair comparison to prior works. ... We use 90% of the training split of UCF101 for pretraining. During evaluation, for each dataset, 90% of the training set is used for finetuning (the same 90% pretrained for UCF101) and the rest 10% is used for testing. |
| Hardware Specification | No | The paper does not specify any particular hardware components (e.g., GPU models, CPU types, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'Py Scene Detect tool' and the 'BMN framework', as well as 'Stochastic gradient descent(SGD)' for optimization, but it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | Stochastic gradient descent(SGD) is used for optimization with an initial learning rate of 0.01. For UCF101 (K400), the model is pretrained with a batch size of 32 (64) for 200 (40) epochs, and the learning rate is decayed by 0.1 at the 100th and 150th (20th and 30th) epochs when the loss plateaus. We let ω=0.5 in (3) and w1=w2=1.0, w3=0.1 in (4). We set the length of input video clip ln as 16 with a resolution of 112 112. |