reproducibilityindex.ai

Structured Two-Stream Attention Network for Video Question Answering

Authors: Lianli Gao, Pengpeng Zeng, Jingkuan Song, Yuan-Fang Li, Wu Liu, Tao Mei, Heng Tao Shen6391-6398

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the large-scale video QA dataset TGIF-QA show that our proposed method signiﬁcantly surpasses the best counterpart (i.e., with one representation for the video input) by 13.0%, 13.5%, 11.0% and 0.3 for Action, Trans., Trame QA and Count tasks.
Researcher Affiliation	Collaboration	Lianli Gao,1 Pengpeng Zeng,1 Jingkuan Song,1 Yuan-Fang Li,2 Wu Liu,3 Tao Mei,3 Heng Tao Shen1 1Center for Future Media and School of Computer Science and Engineering, University of Electronic Science and Technology of China 2Monash University 3JD AI Research lianli.g...@uestc.edu.cn, {is.pengpengzeng,jingkuan.song}@gmail.com, {liuwu,tmei}@live.com, shenhengtao@hotmail.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	To evaluate the performance of the video QA models, we follow two recent video QA work (Jang et al. 2017; Gao et al. 2018a) to evaluate our method on the large-scale public video QA dataset TGIF-QA. TGIF-QA Dataset. It is a large-scale dataset collected by (Jang et al. 2017), which is designed speciﬁcally for video QA to better evaluate a model s capacity for deeper video understanding and reasoning.
Dataset Splits	No	The paper provides train and test splits in Table 1, but does not explicitly mention a separate validation split for hyperparameter tuning or early stopping.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running its experiments.
Software Dependencies	No	The paper mentions 'our implementation is based on the Pytorch library' but does not specify the version number for PyTorch or any other software dependencies.
Experiment Setup	Yes	In our experiments, the optimization algorithm is Adamax. The batch size is set as 128. The train epoch is set as 30. In addition, gradient clipping, weight normalization and dropout are employed in training. For text representation, we ﬁrst encode each word with a pre-trained Glo Ve embedding to generate a 300-D vector... All the words are further encoded by a one-layer LSTM, whose hidden state has the size of 512.