KVQA: Knowledge-Aware Visual Question Answering

Authors: Sanket Shah, Anand Mishra, Naganand Yadati, Partha Pratim Talukdar8876-8884

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Further, we also provide baseline performances using state-of-the-art methods on KVQA. We also provide performances of state-of-the-art methods when applied over the KVQA dataset.
Researcher Affiliation Academia Sanket Shah,1* Anand Mishra,2* Naganand Yadati,2 Partha Pratim Talukdar2 1IIIT Hyderabad, India, 2Indian Institute of Science, Bangalore, India
Pseudocode No The paper describes methods in text and uses a diagram (Figure 5) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No Our dataset KVQA can be viewed and downloaded from our project website: http://malllabiisc.github.io/resources/kvqa/. It contains 183K question-answer pairs about more than 18K persons and 24K images. All these preprocessed data and reference images are publicly available in our project website. This refers to data, not the methodology's code.
Open Datasets Yes Our dataset KVQA can be viewed and downloaded from our project website: http://malllabiisc.github.io/resources/kvqa/. It contains 183K question-answer pairs about more than 18K persons and 24K images.
Dataset Splits Yes We randomly divide 70%, 20% and 10% of images respectively for train, test, and validation, respectively. KVQA dataset contains 17K, 5K and 2K images, and corresponding approximately 130K, 34K and 19K question-answer pairs in one split of train, validation and test, respectively.
Hardware Specification No The paper mentions training models but does not provide specific hardware details such as GPU/CPU models or memory used for experiments.
Software Dependencies No The paper mentions using tools like Facenet and Dexter, and a specific Wikidata dump date (05-05-2018), but does not provide specific version numbers for software dependencies such as programming languages or libraries used for implementation and training.
Experiment Setup Yes The cross-entropy loss between predicted and ground truth answers is used, and the loss is minimized using stochastic gradient descent. In order to utilize multi-hop facts more effectively, we stacked memory layers and refine question representation qk+1 at each layer as sum of output representation at that layer and question representation at previous layer. Note that we stack three memory layers (K = 3) in our implementation. Further, the input and output embeddings are the same across different layers