reproducibilityindex.ai

Learning Implicit Temporal Alignment for Few-shot Video Classification

Authors: Songyang Zhang, Jiale Zhou, Xuming He

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results on two challenging benchmarks, show that our method outperforms the prior arts with a sizable margin on Something Something-V2 and competitive results on Kinetics. In this section, we conduct a series of experiments to validate the effectiveness of our method. Below we ﬁrst give a brief introduction of experimental conﬁgurations and report the quantitative results on two benchmarks in Sec. 5.1. Then we conduct ablative experiments to show the efﬁcacy of our model design in Sec. 5.2.
Researcher Affiliation	Academia	Songyang Zhang1,2,4, , Jiale Zhou1, , Xuming He1,3 1Shanghai Tech University 2 University of Chinese Academy of Sciences 3Shanghai Engineering Research Center of Intelligent Vision and Imaging 4Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences
Pseudocode	No	The paper describes the proposed methods using natural language and mathematical equations, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and model are available: https://github.com/tonysy/Py Action
Open Datasets	Yes	Following previous works, we use the Kinetics [Carreira and Zisserman, 2017b] and Something Something V2 [Goyal et al., 2017] as the benchmarks.
Dataset Splits	Yes	For the Kinetics dataset, we follow the same split as CMN [Zhu and Yang, 2018], which samples 64 classes for meta training, 12 classes for validation, and 24 classes for meta testing.
Hardware Specification	No	The paper describes the experimental setup and training procedures but does not provide specific details regarding the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies	No	The paper refers to using ResNet as an embedding network but does not list specific software dependencies with their version numbers (e.g., deep learning frameworks, libraries, or operating systems).
Experiment Setup	Yes	Experimental Conﬁguration We follow the same video preprocessing procedure as OTAM [Cao et al., 2020]. During training, we ﬁrst resize each frame in the video to 256 256 and then randomly crop a 224 224 region from the video clip. For the Something-Something V2 dataset, as pointed out in [Cao et al., 2020], the dataset is sensitive to concepts of left and right, hence we do not use horizontal ﬂip for this dataset. Following the experiment settings and learning schedule from [Zhu and Yang, 2018] [Cao et al., 2020], we perform different C-way K-shot experiments on the two datasets, with 95% conﬁdence interval in the meta-test phase. Specifically, the ﬁnal results are reported over 5 runs and we randomly sample 20,000 episodes for each run.