DBDNet: Learning Bi-directional Dynamics for Early Action Prediction

Authors: Guoliang Pang, Xionghui Wang, Jian-Fang Hu, Qing Zhang, Wei-Shi Zheng

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on two benchmark action datasets show that learning bi-directional dynamics benefits the early action prediction and our system clearly outperforms the state-of-the-art methods.
Researcher Affiliation Academia 1Sun Yat-sen University, China 2Guangdong Province Key Laboratory of Information Security Technology, China 3Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, China {panggliang, wxiongh}@mail2.sysu.edu.cn, hujianf5@mail.sysu.edu.cn, zhangqing.whu.cs@gmail.com, wszheng@ieee.org
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions 'https://github.com/kenshohara/3D-Res Nets-Py Torch' for a third-party feature extractor, but does not provide an explicit statement or link for the source code of the DBDNet methodology described in the paper.
Open Datasets Yes Our experiments on two benchmark datasets (UCF 101 and NTU RGB+D action sets) demonstrate that the proposed method can predict actions at early stages and outperform the state-of-the-art by a clear margin on both sets. UCF101 Dataset The UCF101 dataset consists of 13,320 videos from 101 action categories. NTU RGB+D Action Dataset The NTU RGB+D action dataset contains 56,880 RGB+D videos from 60 actions.
Dataset Splits Yes Following the evaluation criterion in [Kong et al., 2018], we used the videos from the first 15 groups for training, the next 3 groups for validation, and the last 7 groups for testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using 'Adam algorithm' and references a PyTorch implementation for a feature extractor, but it does not specify concrete version numbers for general software dependencies like Python, PyTorch, or other libraries.
Experiment Setup Yes We instantiated motion synthesis block and motion reasoning block as an one-layer ac LSTM with a fully connected layer. We defined the action prediction block as an one-layer Bi LSTM. The weight α for fusing the outputs of motion synthesis and reasoning block was set as 0.6 in all of our experiments. The parameters condition length and ground-truth length in ac LSTMs were set as 1. We set the hidden sizes of ac LSTM and Bi-LSTM as 2048 and 768, respectively. We placed a dropout layer on top of Bi-LSTM, where the probability was set as 0.5. We optimized our DBDNet using Adam algorithm with a batch size of 32 in all of our experiments. For the experiments on UCF101 set... The learning rate was set as 1 10 5 for both the motion synthesis and reasoning blocks, and 5 10 6 for the action prediction block. The parameters w1 and w2 were set as 1 and 0.01, respectively. For the experiments on NTU RGB+D action set... The learning rate was set as 1 10 4 for both motion synthesis and reasoning blocks, and 1 10 3 for the action prediction block. The parameters w1 and w2 were set as 1 and 0.1 respectively.