Contrastive Spatio-Temporal Pretext Learning for Self-Supervised Video Representation
Authors: Yujia Zhang, Lai-Man Po, Xuyuan Xu, Mengyang Liu, Yexin Wang, Weifeng Ou, Yuzhi Zhao, Wing-Yin Yu3380-3389
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our proposed STOR task can favor both contrastive learning and pretext tasks. The joint optimization scheme can significantly improve the spatio-temporal representation in video understanding. ... Extensive experimental evaluations on two downstream video understanding tasks demonstrate the effectiveness of the proposed approach. ... Ablation studies demonstrate the efficacy of the proposed STOR and the mutual influence of contrastive learning and pretext tasks. |
| Researcher Affiliation | Collaboration | 1Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China 2AI Technology Center, OVB, Tencent, Shenzhen, China |
| Pseudocode | No | The paper includes pipeline diagrams (e.g., Figure 2, Figure 4) but no explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/Katou2/CSTP. |
| Open Datasets | Yes | Kinetics-400 (Carreira and Zisserman 2017) is one of the large-scale action recognition benchmarks... UCF-101 (Soomro, Zamir, and Shah 2012) is a widely used benchmark... HMDB-51 (Kuehne et al. 2011) is also a small-scale dataset... |
| Dataset Splits | Yes | UCF-101 (Soomro, Zamir, and Shah 2012) ... It has three splits... HMDB-51 (Kuehne et al. 2011) ... consists of three splits... The results are summarized in Table 1. In the table, Base means basic data augmentation methods which includes multi-scale random cropping, random gaussian blur, random color jittering, random temporal jittering. |
| Hardware Specification | No | The paper mentions using network backbones like C3D, R(2+1)D, and S3D, but it does not specify the actual hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies (e.g., programming languages, libraries, or frameworks like Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | R(2+1)D was adopted as the backbone and conduct experiments on split 1 of dataset UCF-101. ... To explore the mutual influence of multiple contrastive learning schemes and different pretext tasks, we conducted experiments on four popular contrastive learning schemes which are Sim CLR, Mo Co, BYOL and Sim Siam. ... We conducted 6 sets of candidates to demonstrate the influence of the choice of candidates, which are 2 candidates [0.5, 1], 3 candidates [0.33, 0.66, 0.99], 4 candidates [0.25, 0.5, 0.75, 1.0], 5 candidates [0.2, 0.4, 0.6, 0.8, 1.0], 6 candidates [0.166, 0.332, 0.498, 0.664, 0.83, 1.0], 7 candidates [0.143, 0.286, 0.429, 0.572, 0.715, 0.858, 1]. |