Memory-Augmented Temporal Dynamic Learning for Action Recognition

Authors: Yuan Yuan, Dong Wang, Qi Wang9167-9175

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate this end-to-end system on benchmark datasets (UCF101 and HMDB51) of human action recognition. The experimental results show consistent improvements on both datasets over prior works and our baselines.
Researcher Affiliation Academia Yuan Yuan, Dong Wang, Qi Wang School of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi an 710072, China, {y.yuan1.ieee, nwpuwangdong, crabwq}@gmail.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link regarding the availability of its source code.
Open Datasets Yes Experiments are mainly conducted on two action recognition benchmark datasets: UCF101 (Soomro, Zamir, and Shah 2012) and HMDB51 (Kuehne et al. 2011).
Dataset Splits No The paper states: 'For both of them, we follow the provided evaluation protocol and adopt standard training/test splits and report the mean classification accuracy over these splits.' This refers to standard splits but does not provide specific details (percentages or counts) for reproducibility within the paper itself.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments. It only mentions that 'All the experiments are run on the Py Torch toolbox'.
Software Dependencies No The paper states: 'All the experiments are run on the Py Torch toolbox (Paszke et al. 2017).' It mentions PyTorch but does not specify its version number or any other software dependencies with their versions.
Experiment Setup Yes The mini-batch size is set to 64 and the momentum is set to 0.9. We use small learning rate in our experiments. For spatial-stream networks, the learning rate is initialized as 0.001 and decrease by 1/10 every 6,000 iterations. The training procedure stops after 18,000 iterations. For the temporal stream, we initialize the learning rate as 0.005, which reduces to its 1/10 after 48,000 and 72,000 iterations. The maximum iteration is set as 80,000. We use gradient clipping of 20 to avoid exploding gradient at the early stage.