Predicting Human Interaction via Relative Attention Model
Authors: Yichao Yan, Bingbing Ni, Xiaokang Yang
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments have been conducted on two public datasets, and the results demonstrate that the proposed relative attention network successfully predicts informative regions between interacting subjects, which in turn yields superior human interaction prediction accuracy. |
| Researcher Affiliation | Academia | Yichao Yan, Bingbing Ni, Xiaokang Yang Shanghai Jiao Tong University, Shanghai, China {yanyichao, nibingbing, xkyang}@sjtu.edu.cn |
| Pseudocode | No | The paper describes the network architecture and training procedure using equations and text but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks. |
| Open Source Code | No | The paper does not provide any link to source code or explicitly state that code for the described methodology is open source or publicly available. |
| Open Datasets | Yes | Extensive experiments have been conducted on two public datasets, and the results demonstrate that the proposed relative attention network successfully predicts informative regions between interacting subjects, which in turn yields superior human interaction prediction accuracy." and "UT dataset. The UT-Interaction dataset (UTI) [Ryoo and Aggarwal, 2010]... BIT dataset. The BIT dataset [Kong et al., 2012] |
| Dataset Splits | Yes | We adopt 10-folder leave-one-out cross validation setting to measure the performance of the two subsets." and "a random subset containing 272 videos is used for training, and the remaining 128 videos are used for testing. |
| Hardware Specification | Yes | The complete duration of training time is about 12 hours on a Titan X GPU. |
| Software Dependencies | No | The paper mentions "Caffe [Jia et al., 2014]" and "Alexnet [Krizhevsky et al., 2012]" but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | The LSTM layer contains 512 hidden units, and a dropout layer is placed after it to avoid overfitting. To increase training instances and to make our model applicable for sequences of variable length, we randomly extract subsequences of fixed length L (L = 10 in our experiments) for training. To train the LSTM networks, the original learning rate is initialized as 0.001, and the learning rate is decreased to 1/10 of the original value after each 10 epochs. The whole training phase includes 30 epochs. During testing, we extract the subsequences in the testing video with a stride of 5, and averaging their classification score as prediction. |