Predicting Human Interaction via Relative Attention Model

Authors: Yichao Yan, Bingbing Ni, Xiaokang Yang

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments have been conducted on two public datasets, and the results demonstrate that the proposed relative attention network successfully predicts informative regions between interacting subjects, which in turn yields superior human interaction prediction accuracy.
Researcher Affiliation Academia Yichao Yan, Bingbing Ni, Xiaokang Yang Shanghai Jiao Tong University, Shanghai, China {yanyichao, nibingbing, xkyang}@sjtu.edu.cn
Pseudocode No The paper describes the network architecture and training procedure using equations and text but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code No The paper does not provide any link to source code or explicitly state that code for the described methodology is open source or publicly available.
Open Datasets Yes Extensive experiments have been conducted on two public datasets, and the results demonstrate that the proposed relative attention network successfully predicts informative regions between interacting subjects, which in turn yields superior human interaction prediction accuracy." and "UT dataset. The UT-Interaction dataset (UTI) [Ryoo and Aggarwal, 2010]... BIT dataset. The BIT dataset [Kong et al., 2012]
Dataset Splits Yes We adopt 10-folder leave-one-out cross validation setting to measure the performance of the two subsets." and "a random subset containing 272 videos is used for training, and the remaining 128 videos are used for testing.
Hardware Specification Yes The complete duration of training time is about 12 hours on a Titan X GPU.
Software Dependencies No The paper mentions "Caffe [Jia et al., 2014]" and "Alexnet [Krizhevsky et al., 2012]" but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes The LSTM layer contains 512 hidden units, and a dropout layer is placed after it to avoid overfitting. To increase training instances and to make our model applicable for sequences of variable length, we randomly extract subsequences of fixed length L (L = 10 in our experiments) for training. To train the LSTM networks, the original learning rate is initialized as 0.001, and the learning rate is decreased to 1/10 of the original value after each 10 epochs. The whole training phase includes 30 epochs. During testing, we extract the subsequences in the testing video with a stride of 5, and averaging their classification score as prediction.