reproducibilityindex.ai

Hierarchical Question-Image Co-Attention for Visual Question Answering

Authors: Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our model improves the state-of-the-art on the VQA dataset from 60.3% to 60.5%, and from 61.6% to 63.3% on the COCO-QA dataset. By using Res Net, the performance is further improved to 62.1% for VQA and 65.4% for COCO-QA.1. We evaluate our proposed model on two large datasets, VQA [2] and COCO-QA [15]. We also perform ablation studies to quantify the roles of different components in our model.
Researcher Affiliation	Academia	Jiasen Lu , Jianwei Yang , Dhruv Batra , Devi Parikh Virginia Tech, Georgia Institute of Technology {jiasenlu, jw2yang, dbatra, parikh}@vt.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	1The source code can be downloaded from https://github.com/jiasenlu/Hie Co Atten VQA
Open Datasets	Yes	VQA dataset [2] is the largest dataset for this problem, containing human annotated questions and answers on Microsoft COCO dataset [12]. COCO-QA dataset [15] is automatically generated from captions in the Microsoft COCO dataset [12].
Dataset Splits	Yes	VQA dataset [2] is the largest dataset for this problem, containing human annotated questions and answers on Microsoft COCO dataset [12]. The dataset contains 248,349 training questions, 121,512 validation questions, 244,302 testing questions, and a total of 6,141,630 question-answers pairs. For testing, we train our model on VQA train+val and report the test-dev and test-standard results from the VQA evaluation server.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies	Yes	We use Torch [4] to develop our model. [4] R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A matlab-like environment for machine learning. In Big Learn, NIPS Workshop, 2011.
Experiment Setup	Yes	We use the Rmsprop optimizer with a base learning rate of 4e-4, momentum 0.99 and weight-decay 1e-8. We set batch size to be 300 and train for up to 256 epochs with early stopping if the validation accuracy has not improved in the last 5 epochs. For COCO-QA, the size of hidden layer Ws is set to 512 and 1024 for VQA since it is a much larger dataset. All the other word embedding and hidden layers were vectors of size 512. We apply dropout with probability 0.5 on each layer.