Searching for Better Spatio-temporal Alignment in Few-Shot Action Recognition
Authors: Yichao Cao, Xiu Su, Qingfei Tang, Shan You, Xiaobo Lu, Chang Xu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our proposed method is evaluated on two popular benchmarks: UCF101 [31] and HMDB51 [21]... Table 2 shows the comparison of our method to state-of-the-art... We conduct several experiments on hand-engineered and searched architectures. In Plain model, we follow the Time Sformer[2] and Le Vit [16] to construct a plain model, which has a structure similar to our Transformer space, but the intermediate modules are set manually. |
| Researcher Affiliation | Collaboration | Yichao Cao1 , Xiu Su2 , Qingfei Tang3, Shan You4, Xiaobo Lu1, Chang Xu2 1School of Automation, Southeast University, 2School of Computer Science, Faculty of Engineering, The University of Sydney, 3Enbo Technology Co.,Ltd., China, 4Sense Time Research |
| Pseudocode | Yes | Algorithm 1: Training supernet with Transformer space shrinking |
| Open Source Code | No | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A] |
| Open Datasets | Yes | Our proposed method is evaluated on two popular benchmarks: UCF101 [31] and HMDB51 [21], using split from [27]. |
| Dataset Splits | Yes | The dataset is separated into a training set Dtr = {(xi, yi) | yi Ctrain}, a validation set Dval = {(xi, yi) | yi Cvalidation}, and a test set Dtest = {(xi, yi) | yi Ctest}, where the validation set is split from the training set with 1/10 samples... |
| Hardware Specification | Yes | All experiments are implemented with 8 Nvidia 1080Ti GPUs. |
| Software Dependencies | No | During training, the Adam W optimization method is used to train the supernet from scratch. (This mentions a software component but lacks specific version numbers for it or any other key libraries.) |
| Experiment Setup | Yes | For each n-way k-shot episodic task, we randomly sample n class with each class containing k examples as the support set. And we randomly sample one sample for each class in n classes as the query set... For video preprocessing, the sparse sampling strategy is used to fetch T frames for each video. During training, we resize sampled frames to 256 256 and then randomly crop a 224 224 region as input. During testing, random crop is replaced by center crop... the Adam W optimization method is used to train the supernet from scratch. |