reproducibilityindex.ai

Contrastive Spatio-Temporal Pretext Learning for Self-Supervised Video Representation

Authors: Yujia Zhang, Lai-Man Po, Xuyuan Xu, Mengyang Liu, Yexin Wang, Weifeng Ou, Yuzhi Zhao, Wing-Yin Yu3380-3389

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our proposed STOR task can favor both contrastive learning and pretext tasks. The joint optimization scheme can signiﬁcantly improve the spatio-temporal representation in video understanding. ... Extensive experimental evaluations on two downstream video understanding tasks demonstrate the effectiveness of the proposed approach. ... Ablation studies demonstrate the efﬁcacy of the proposed STOR and the mutual inﬂuence of contrastive learning and pretext tasks.
Researcher Affiliation	Collaboration	1Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China 2AI Technology Center, OVB, Tencent, Shenzhen, China
Pseudocode	No	The paper includes pipeline diagrams (e.g., Figure 2, Figure 4) but no explicit pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/Katou2/CSTP.
Open Datasets	Yes	Kinetics-400 (Carreira and Zisserman 2017) is one of the large-scale action recognition benchmarks... UCF-101 (Soomro, Zamir, and Shah 2012) is a widely used benchmark... HMDB-51 (Kuehne et al. 2011) is also a small-scale dataset...
Dataset Splits	Yes	UCF-101 (Soomro, Zamir, and Shah 2012) ... It has three splits... HMDB-51 (Kuehne et al. 2011) ... consists of three splits... The results are summarized in Table 1. In the table, Base means basic data augmentation methods which includes multi-scale random cropping, random gaussian blur, random color jittering, random temporal jittering.
Hardware Specification	No	The paper mentions using network backbones like C3D, R(2+1)D, and S3D, but it does not specify the actual hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies (e.g., programming languages, libraries, or frameworks like Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	R(2+1)D was adopted as the backbone and conduct experiments on split 1 of dataset UCF-101. ... To explore the mutual inﬂuence of multiple contrastive learning schemes and different pretext tasks, we conducted experiments on four popular contrastive learning schemes which are Sim CLR, Mo Co, BYOL and Sim Siam. ... We conducted 6 sets of candidates to demonstrate the inﬂuence of the choice of candidates, which are 2 candidates [0.5, 1], 3 candidates [0.33, 0.66, 0.99], 4 candidates [0.25, 0.5, 0.75, 1.0], 5 candidates [0.2, 0.4, 0.6, 0.8, 1.0], 6 candidates [0.166, 0.332, 0.498, 0.664, 0.83, 1.0], 7 candidates [0.143, 0.286, 0.429, 0.572, 0.715, 0.858, 1].