Structured Two-Stream Attention Network for Video Question Answering
Authors: Lianli Gao, Pengpeng Zeng, Jingkuan Song, Yuan-Fang Li, Wu Liu, Tao Mei, Heng Tao Shen6391-6398
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the large-scale video QA dataset TGIF-QA show that our proposed method significantly surpasses the best counterpart (i.e., with one representation for the video input) by 13.0%, 13.5%, 11.0% and 0.3 for Action, Trans., Trame QA and Count tasks. |
| Researcher Affiliation | Collaboration | Lianli Gao,1 Pengpeng Zeng,1 Jingkuan Song,1 Yuan-Fang Li,2 Wu Liu,3 Tao Mei,3 Heng Tao Shen1 1Center for Future Media and School of Computer Science and Engineering, University of Electronic Science and Technology of China 2Monash University 3JD AI Research lianli.g...@uestc.edu.cn, {is.pengpengzeng,jingkuan.song}@gmail.com, {liuwu,tmei}@live.com, shenhengtao@hotmail.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | To evaluate the performance of the video QA models, we follow two recent video QA work (Jang et al. 2017; Gao et al. 2018a) to evaluate our method on the large-scale public video QA dataset TGIF-QA. TGIF-QA Dataset. It is a large-scale dataset collected by (Jang et al. 2017), which is designed specifically for video QA to better evaluate a model s capacity for deeper video understanding and reasoning. |
| Dataset Splits | No | The paper provides train and test splits in Table 1, but does not explicitly mention a separate validation split for hyperparameter tuning or early stopping. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments. |
| Software Dependencies | No | The paper mentions 'our implementation is based on the Pytorch library' but does not specify the version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | In our experiments, the optimization algorithm is Adamax. The batch size is set as 128. The train epoch is set as 30. In addition, gradient clipping, weight normalization and dropout are employed in training. For text representation, we first encode each word with a pre-trained Glo Ve embedding to generate a 300-D vector... All the words are further encoded by a one-layer LSTM, whose hidden state has the size of 512. |