Multi-Scale Spatial-Temporal Integration Convolutional Tube for Human Action Recognition
Authors: Haoze Wu, Jiawei Liu, Xierong Zhu, Meng Wang, Zheng-Jun Zha
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results show that our MSTI-Net significantly boosts the performance of existing convolution networks and achieves stateof-the-art accuracy on three challenging benchmarks, i.e., UCF-101, HMDB-51 and Kinetics-400, with much fewer parameters and FLOPs. |
| Researcher Affiliation | Academia | Haoze Wu1 , Jiawei Liu1 , Xierong Zhu1 , Meng Wang2 and Zheng-Jun Zha1 1University of Science and Technology of China 2Hefei University of Technology |
| Pseudocode | No | The paper provides architectural diagrams and mathematical formulas but no pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We use three widely-used and challenging benchmarks, i.e. Kinetics-400 [Kay et al., 2017], UCF-101 [Soomro et al., 2012], and HMDB-51 [Kuehne et al., 2013] in the experiments. |
| Dataset Splits | No | Both UCF-101 and HMDB-51 consists of three training/test splits provided by the datasets organizers. The paper explicitly mentions 'training/test splits' but does not explicitly specify a 'validation' split or its size/methodology. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Adam Gradient Descent optimizer' but does not specify software names with version numbers for libraries, frameworks, or other dependencies. |
| Experiment Setup | Yes | Our data augmentation includes random clipping on both spatial dimension (by firstly resizing the smaller video side to 256 pixels, then randomly cropping a 224 224 patch) and temporal dimension (by randomly picking the starting frame among those early enough to guarantee a desired number of frames). We use the Adam Gradient Descent optimizer with an initial learning rate of 1e 4 to train the MSTI-related networks from scratch. The drop out ratio is set to 0.5 and the weight decay rate is set to 5e 5. The gradient descent optimizer has the 1e 5 initial learning rate, and it is adopted with a momentum of 0.9 to train our MSTI-Net initialized with the Kinetics-400 and Image Net-1k pre-trained model. To prevent over-fitting, we further employ a higher drop out ratio of 0.9 and a weight decay rate of 5e 4. |