reproducibilityindex.ai

SeCo: Exploring Sequence Supervision for Unsupervised Representation Learning

Authors: Ting Yao, Yiheng Zhang, Zhaofan Qiu, Yingwei Pan, Tao Mei10656-10664

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Se Co shows superior results under the linear protocol on action recognition (Kinetics), untrimmed activity recognition (Activity Net) and object tracking (OTB100). More remarkably, Se Co demonstrates considerable improvements over recent unsupervised pre-training techniques, and leads the accuracy by 2.96% and 6.47% against fully-supervised Image Net pre-training in action recognition task on UCF101 and HMDB51, respectively. Also, We empirically verify the merit of Se Co for unsupervised representation learning in three downstream tasks: action recognition, untrimmed activity recognition and object tracking.
Researcher Affiliation	Industry	Ting Yao, Yiheng Zhang, Zhaofan Qiu, Yingwei Pan, Tao Mei JD AI Research, Beijing, China
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	Source code is available at https://github.com/Yiheng Zhang-CV/Se Co Sequence-Contrastive-Learning.
Open Datasets	Yes	Datasets Kinetics400 dataset (Kay et al. 2017) is one of the large-scale action recognition benchmarks... UCF101 (Soomro, Zamir, and Shah 2012) is one of the most popular action recognition benchmarks... HMDB51 (Kuehne et al. 2011) is another widely used action recognition dataset... Activity Net dataset (Heilbron et al. 2015) is a large-scale human activity understanding benchmark... GOT-10K (Huang, Zhao, and Huang 2019)... OTB-100 (Wu, Lim, and Yang 2015).
Dataset Splits	Yes	All the videos are grouped into three subsets for training (240K), validation (20K), and testing (40K), respectively. Because the labels of testing set are not publicly available, the performances on the Kinetics400 dataset are reported on the validation set. This dataset consists of 13,320 videos from 101 action classes, which are split into about 9.5K and 3.7K videos in training and testing set, respectively. The dataset is split into training (3.5K) and testing (1.5K) sets. All the videos in the dataset are divided into 10,024, 4,926, and 5,044 for training, validation, and testing sets, respectively.
Hardware Specification	No	The paper mentions 'shufﬂing BN is utilized for multi-GPU training' but does not provide any specific details about the GPU models (e.g., NVIDIA A100, Tesla V100) or other hardware components used for the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies or libraries used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	In our implementations, the size of the mini-batch is set to 512 and the size of memory is 131,072. The momentum coefﬁcient α for momentum update of the encoder is set to 0.999 and the temperature τ in info NCE loss is 0.1. Following (He et al. 2019), shufﬂing BN is utilized for multi-GPU training. To optimize the parameters in the encoder, we use the momentum SGD with initial learning rate 0.2 which is annealed down to zero following a cosine decay. The network is trained for 400 epoch base on the network initialized with Mo Co (He et al. 2019) on Image Net. For data augmentation, we employ random cropping with random scales, color-jitter, random grayscale, blur, and mirror.