reproducibilityindex.ai

Searching for Better Spatio-temporal Alignment in Few-Shot Action Recognition

Authors: Yichao Cao, Xiu Su, Qingfei Tang, Shan You, Xiaobo Lu, Chang Xu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our proposed method is evaluated on two popular benchmarks: UCF101 [31] and HMDB51 [21]... Table 2 shows the comparison of our method to state-of-the-art... We conduct several experiments on hand-engineered and searched architectures. In Plain model, we follow the Time Sformer[2] and Le Vit [16] to construct a plain model, which has a structure similar to our Transformer space, but the intermediate modules are set manually.
Researcher Affiliation	Collaboration	Yichao Cao1 , Xiu Su2 , Qingfei Tang3, Shan You4, Xiaobo Lu1, Chang Xu2 1School of Automation, Southeast University, 2School of Computer Science, Faculty of Engineering, The University of Sydney, 3Enbo Technology Co.,Ltd., China, 4Sense Time Research
Pseudocode	Yes	Algorithm 1: Training supernet with Transformer space shrinking
Open Source Code	No	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A]
Open Datasets	Yes	Our proposed method is evaluated on two popular benchmarks: UCF101 [31] and HMDB51 [21], using split from [27].
Dataset Splits	Yes	The dataset is separated into a training set Dtr = {(xi, yi) \| yi Ctrain}, a validation set Dval = {(xi, yi) \| yi Cvalidation}, and a test set Dtest = {(xi, yi) \| yi Ctest}, where the validation set is split from the training set with 1/10 samples...
Hardware Specification	Yes	All experiments are implemented with 8 Nvidia 1080Ti GPUs.
Software Dependencies	No	During training, the Adam W optimization method is used to train the supernet from scratch. (This mentions a software component but lacks specific version numbers for it or any other key libraries.)
Experiment Setup	Yes	For each n-way k-shot episodic task, we randomly sample n class with each class containing k examples as the support set. And we randomly sample one sample for each class in n classes as the query set... For video preprocessing, the sparse sampling strategy is used to fetch T frames for each video. During training, we resize sampled frames to 256 256 and then randomly crop a 224 224 region as input. During testing, random crop is replaced by center crop... the Adam W optimization method is used to train the supernet from scratch.