Explicit Knowledge-based Reasoning for Visual Question Answering
Authors: Peng Wang, Qi Wu, Chunhua Shen, Anthony Dick, Anton van den Hengel
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first evaluate different approaches automatically using simple string matching and Wu-Palmer similarity (WUPS) [Malinowski and Fritz, 2014a], in which the human answers are considered as ground truth. |
| Researcher Affiliation | Academia | Peng Wang 1,2, Qi Wu 3, Chunhua Shen2,3, Anthony Dick2, Anton van den Hengel2,3 1Northwestern Polytechnical University, China 2The University of Adelaide, Australia, 3Australian Centre for Robotic Vision |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper provides a link for its dataset: 'https://bitbucket.org/sxjzwq1987/kb-vqa-dataset', but does not explicitly state that the source code for the proposed methodology is available. |
| Open Datasets | Yes | We select 700 of the validation images from the MS COCO [Lin et al., 2014] dataset... The LSTM is trained on the training set of VQA data [Antol et al., 2015]4. |
| Dataset Splits | No | The paper mentions selecting '700 of the validation images from the MS COCO' to create their dataset, but does not specify a validation split for their own KB-VQA dataset or for the training of their primary model (Ahab). The LSTM baseline only refers to 'training set' and 'test set'. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like Quepy and NLTK ('Quepy begins by tagging each word in the question using NLTK [Bird et al., 2009]'), but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Specifically, we use the second fully-connected layer (4096-d) of a pre-trained VGG model as the image features, and the LSTM is trained on the training set of VQA data [Antol et al., 2015]4. The LSTM layer contains 512 memory cells in each unit. |