Explicit Knowledge-based Reasoning for Visual Question Answering

Authors: Peng Wang, Qi Wu, Chunhua Shen, Anthony Dick, Anton van den Hengel

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first evaluate different approaches automatically using simple string matching and Wu-Palmer similarity (WUPS) [Malinowski and Fritz, 2014a], in which the human answers are considered as ground truth.
Researcher Affiliation Academia Peng Wang 1,2, Qi Wu 3, Chunhua Shen2,3, Anthony Dick2, Anton van den Hengel2,3 1Northwestern Polytechnical University, China 2The University of Adelaide, Australia, 3Australian Centre for Robotic Vision
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper provides a link for its dataset: 'https://bitbucket.org/sxjzwq1987/kb-vqa-dataset', but does not explicitly state that the source code for the proposed methodology is available.
Open Datasets Yes We select 700 of the validation images from the MS COCO [Lin et al., 2014] dataset... The LSTM is trained on the training set of VQA data [Antol et al., 2015]4.
Dataset Splits No The paper mentions selecting '700 of the validation images from the MS COCO' to create their dataset, but does not specify a validation split for their own KB-VQA dataset or for the training of their primary model (Ahab). The LSTM baseline only refers to 'training set' and 'test set'.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions software like Quepy and NLTK ('Quepy begins by tagging each word in the question using NLTK [Bird et al., 2009]'), but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Specifically, we use the second fully-connected layer (4096-d) of a pre-trained VGG model as the image features, and the LSTM is trained on the training set of VQA data [Antol et al., 2015]4. The LSTM layer contains 512 memory cells in each unit.