High Performance Gesture Recognition via Effective and Efficient Temporal Modeling
Authors: Yang Yi, Feng Ni, Yuexin Ma, Xinge Zhu, Yuankai Qi, Riming Qiu, Shijie Zhao, Feng Li, Yongtao Wang
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments conducted on public datasets demonstrate that our proposed model achieves the state-of-the-art with higher efficiency. Moreover, the proposed MKTB and GRB are plug-and-play modules and the experiments on other tasks, like video understanding and video-based person reidentification, also display their good performance in efficiency and capability of generalization. |
| Researcher Affiliation | Collaboration | Yang Yi1 , Feng Ni2 , Yuexin Ma3 , Xinge Zhu4 , Yuankai Qi5 , Riming Qiu1 , Shijie Zhao1 , Feng Li1 and Yongtao Wang2 1 Media Lab, Tencent 2Peking University 3University of Hong Kong 4The Chinese University of Hong Kong 5Harbin Institute of Technology, Weihai, China |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code will be available at https://github.com/nemonameless/Gesture-Recognition. |
| Open Datasets | Yes | Iso GD [Wan et al., 2016] is a large-scale multi-modality gesture dataset which contains 249 gesture classes. Jester[Twenty Bn, 2017] is a large collection of densely-labeled video clips of hand gestures. Something-Somthing-V1[Goyal et al., 2017] is a challenging dataset that shows basic actions with everyday objects. MARS[Zheng et al., 2016] is the largest video-based person re-identification dataset. |
| Dataset Splits | Yes | This database is split into three sub-datasets: 35,878 videos for training, 5,784 videos for validation and 6,271 videos for testing. |
| Hardware Specification | Yes | The proposed networks are trained with Py Torch deep learning framework on GPUs of NVidia Tesla P40 with CUDA 8.0. |
| Software Dependencies | No | The paper mentions "Py Torch deep learning framework" and "CUDA 8.0" but does not specify the version number for PyTorch itself, which is a key software dependency. |
| Experiment Setup | Yes | Unless otherwise noted, we set temporal segments T = 8. Following data augmentation strategies of TSN [Wang et al., 2016], the frames are cropped and resized to 224 224 after aspect ratio jittering and scale jittering. For all experiments, we adopt mini-batch SGD to optimize the model with momentum of 0.9 and weight decay of 5e 4. We train for 60 epochs with cross entropy loss and batch size of 48. The learning rate is initialized as 0.01 and reduced by a factor of 10 every 20 epochs. Dropout layer with ratio of 0.5 is added before the classification layer. |