reproducibilityindex.ai

Self-Supervised Spatiotemporal Representation Learning by Exploiting Video Continuity

Authors: Hanwen Liang, Niamul Quader, Zhixiang Chi, Lizhe Chen, Peng Dai, Juwei Lu, Yang Wang1564-1573

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We carry out extensive experiments and demonstrate the superiority of CPNet in learning more effective video representations. CPNet outperforms prior arts on multiple downstream tasks including action recognition, video retrieval and action localization. Also, the discontinuity localization task is shown to be the most effective pretext task in CPNet, and incorporating it into other typical self-supervised learning methods can bring significant performance gains.
Researcher Affiliation	Collaboration	Hanwen Liang1, Niamul Quader 1, Zhixiang Chi 1, Lizhe Chen1, Peng Dai1, Juwei Lu1 and Yang Wang1,2 1 Huawei Noah s Ark Lab 2 University of Manitoba, Canada
Pseudocode	No	The paper describes the model architecture and training process using text and mathematical equations, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	We use the following benchmark datasets to evaluate the efficacy of CPNet, i.e. UCF101 (Soomro, Zamir, and Shah 2012), HMDB51 (Kuehne et al. 2011), Kinetics400 (abbr. K400) (Kay et al. 2017), Diving48 (Li, Li, and Vasconcelos 2018) and Activity Net-v1.3 (Caba Heilbron et al. 2015).
Dataset Splits	Yes	For UCF101 and HMDB51, We use the training/testing split 1 for fair comparison to prior works. ... We use 90% of the training split of UCF101 for pretraining. During evaluation, for each dataset, 90% of the training set is used for finetuning (the same 90% pretrained for UCF101) and the rest 10% is used for testing.
Hardware Specification	No	The paper does not specify any particular hardware components (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies	No	The paper mentions software like 'Py Scene Detect tool' and the 'BMN framework', as well as 'Stochastic gradient descent(SGD)' for optimization, but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	Stochastic gradient descent(SGD) is used for optimization with an initial learning rate of 0.01. For UCF101 (K400), the model is pretrained with a batch size of 32 (64) for 200 (40) epochs, and the learning rate is decayed by 0.1 at the 100th and 150th (20th and 30th) epochs when the loss plateaus. We let ω=0.5 in (3) and w1=w2=1.0, w3=0.1 in (4). We set the length of input video clip ln as 16 with a resolution of 112 112.