Structured Two-Stream Attention Network for Video Question Answering

Authors: Lianli Gao, Pengpeng Zeng, Jingkuan Song, Yuan-Fang Li, Wu Liu, Tao Mei, Heng Tao Shen6391-6398

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the large-scale video QA dataset TGIF-QA show that our proposed method significantly surpasses the best counterpart (i.e., with one representation for the video input) by 13.0%, 13.5%, 11.0% and 0.3 for Action, Trans., Trame QA and Count tasks.
Researcher Affiliation Collaboration Lianli Gao,1 Pengpeng Zeng,1 Jingkuan Song,1 Yuan-Fang Li,2 Wu Liu,3 Tao Mei,3 Heng Tao Shen1 1Center for Future Media and School of Computer Science and Engineering, University of Electronic Science and Technology of China 2Monash University 3JD AI Research lianli.g...@uestc.edu.cn, {is.pengpengzeng,jingkuan.song}@gmail.com, {liuwu,tmei}@live.com, shenhengtao@hotmail.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes To evaluate the performance of the video QA models, we follow two recent video QA work (Jang et al. 2017; Gao et al. 2018a) to evaluate our method on the large-scale public video QA dataset TGIF-QA. TGIF-QA Dataset. It is a large-scale dataset collected by (Jang et al. 2017), which is designed specifically for video QA to better evaluate a model s capacity for deeper video understanding and reasoning.
Dataset Splits No The paper provides train and test splits in Table 1, but does not explicitly mention a separate validation split for hyperparameter tuning or early stopping.
Hardware Specification No The paper does not provide specific details about the hardware used for running its experiments.
Software Dependencies No The paper mentions 'our implementation is based on the Pytorch library' but does not specify the version number for PyTorch or any other software dependencies.
Experiment Setup Yes In our experiments, the optimization algorithm is Adamax. The batch size is set as 128. The train epoch is set as 30. In addition, gradient clipping, weight normalization and dropout are employed in training. For text representation, we first encode each word with a pre-trained Glo Ve embedding to generate a 300-D vector... All the words are further encoded by a one-layer LSTM, whose hidden state has the size of 512.