Attention-Aware Sampling via Deep Reinforcement Learning for Action Recognition
Authors: Wenkai Dong, Zhaoxiang Zhang, Tieniu Tan8247-8254
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We achieve very competitive action recognition performance on two widely used action recognition datasets. We conduct experiments on two widely used benchmark datasets to demonstrate the effectiveness of our method and achieve competitive results. |
| Researcher Affiliation | Academia | Wenkai Dong,1,3 Zhaoxiang Zhang,1,2,3 Tieniu Tan1,2,3 1Center for Research on Intelligent Perception and Computing (CRIPAC), National Laboratory of Pattern Recognition (NLPR) 2Center for Excellence in Brain Science and Intelligence Technology (CEBSIT), Institute of Automation, Chinese Academy of Sciences (CASIA) 3University of Chinese Academy of Sciences |
| Pseudocode | Yes | Algorithm 1 Attention-aware sampling agent training; Algorithm 2 Attention-aware sampling agent testing |
| Open Source Code | No | The paper does not provide any concrete access information (e.g., specific repository link, explicit code release statement, or code in supplementary materials) for the source code. |
| Open Datasets | Yes | We evaluate our approach on two challenging action recognition datasets: UCF101 (Soomro, Zamir, and Shah 2012) and HMDB51 (Kuehne et al. 2011). |
| Dataset Splits | Yes | The learning rate is initialize as 0.001 and decreases to its 1/10 after 190 and 300 epochs. The maximum epoch is set as 340. For the models pre-trained on Image Net... The whole training procedure stops at 80 epochs. For the models pre-trained on Kinetics, the learning rate is initialize as 0.001 and decreases to its 1/10 after 10 and 20 epochs. The maximum epoch is set as 30 for both streams. For the attention-aware sampling network, the learning rate is initialize as 10e-5 and decreases to its 1/10 after 20 and 40 epochs. The whole training procedure stops at 45 epochs. |
| Hardware Specification | Yes | We use Pytorch to train our deep neural network, and specifically, 4 GTX 1080Ti GPUs are used for the parallel computing. |
| Software Dependencies | No | For the extraction of optical flow, we choose the TVL1 (Zach, Pock, and Bischof 2007) optical flow algorithm implemented in Open CV with CUDA. We use Pytorch to train our deep neural network. The paper mentions software components like "Pytorch", "Open CV", and "CUDA" but does not specify their version numbers. |
| Experiment Setup | Yes | We use the mini-batch stochastic gradient descent algorithm to learn the baseline model parameters, where the batch size is set to 128, weight decay set to 5e-4 and momentum set to 0.9. We initialize network weights with pre-trained models from Image Net and Kinetics. ... The learning rate is initialize as 0.001 and decreases to its 1/10 after 190 and 300 epochs. The maximum epoch is set as 340. For the extraction of optical flow, we choose the TVL1 (Zach, Pock, and Bischof 2007) optical flow algorithm implemented in Open CV with CUDA. For spatial network, the learning rate is initialized as 0.001 and decreases to its 1/10 after 30 and 60 epochs. The whole training procedure stops at 80 epochs. ... The learning rate is initialize as 10e-5 and decreases to its 1/10 after 20 and 40 epochs. The whole training procedure stops at 45 epochs. The weight decay and hyperparameter β are set to 10e-5 and 0.1, respectively. |