DBDNet: Learning Bi-directional Dynamics for Early Action Prediction
Authors: Guoliang Pang, Xionghui Wang, Jian-Fang Hu, Qing Zhang, Wei-Shi Zheng
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on two benchmark action datasets show that learning bi-directional dynamics benefits the early action prediction and our system clearly outperforms the state-of-the-art methods. |
| Researcher Affiliation | Academia | 1Sun Yat-sen University, China 2Guangdong Province Key Laboratory of Information Security Technology, China 3Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, China {panggliang, wxiongh}@mail2.sysu.edu.cn, hujianf5@mail.sysu.edu.cn, zhangqing.whu.cs@gmail.com, wszheng@ieee.org |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions 'https://github.com/kenshohara/3D-Res Nets-Py Torch' for a third-party feature extractor, but does not provide an explicit statement or link for the source code of the DBDNet methodology described in the paper. |
| Open Datasets | Yes | Our experiments on two benchmark datasets (UCF 101 and NTU RGB+D action sets) demonstrate that the proposed method can predict actions at early stages and outperform the state-of-the-art by a clear margin on both sets. UCF101 Dataset The UCF101 dataset consists of 13,320 videos from 101 action categories. NTU RGB+D Action Dataset The NTU RGB+D action dataset contains 56,880 RGB+D videos from 60 actions. |
| Dataset Splits | Yes | Following the evaluation criterion in [Kong et al., 2018], we used the videos from the first 15 groups for training, the next 3 groups for validation, and the last 7 groups for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Adam algorithm' and references a PyTorch implementation for a feature extractor, but it does not specify concrete version numbers for general software dependencies like Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | We instantiated motion synthesis block and motion reasoning block as an one-layer ac LSTM with a fully connected layer. We defined the action prediction block as an one-layer Bi LSTM. The weight α for fusing the outputs of motion synthesis and reasoning block was set as 0.6 in all of our experiments. The parameters condition length and ground-truth length in ac LSTMs were set as 1. We set the hidden sizes of ac LSTM and Bi-LSTM as 2048 and 768, respectively. We placed a dropout layer on top of Bi-LSTM, where the probability was set as 0.5. We optimized our DBDNet using Adam algorithm with a batch size of 32 in all of our experiments. For the experiments on UCF101 set... The learning rate was set as 1 10 5 for both the motion synthesis and reasoning blocks, and 5 10 6 for the action prediction block. The parameters w1 and w2 were set as 1 and 0.01, respectively. For the experiments on NTU RGB+D action set... The learning rate was set as 1 10 4 for both motion synthesis and reasoning blocks, and 1 10 3 for the action prediction block. The parameters w1 and w2 were set as 1 and 0.1 respectively. |