reproducibilityindex.ai

KVQA: Knowledge-Aware Visual Question Answering

Authors: Sanket Shah, Anand Mishra, Naganand Yadati, Partha Pratim Talukdar8876-8884

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Further, we also provide baseline performances using state-of-the-art methods on KVQA. We also provide performances of state-of-the-art methods when applied over the KVQA dataset.
Researcher Affiliation	Academia	Sanket Shah,1* Anand Mishra,2* Naganand Yadati,2 Partha Pratim Talukdar2 1IIIT Hyderabad, India, 2Indian Institute of Science, Bangalore, India
Pseudocode	No	The paper describes methods in text and uses a diagram (Figure 5) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	Our dataset KVQA can be viewed and downloaded from our project website: http://malllabiisc.github.io/resources/kvqa/. It contains 183K question-answer pairs about more than 18K persons and 24K images. All these preprocessed data and reference images are publicly available in our project website. This refers to data, not the methodology's code.
Open Datasets	Yes	Our dataset KVQA can be viewed and downloaded from our project website: http://malllabiisc.github.io/resources/kvqa/. It contains 183K question-answer pairs about more than 18K persons and 24K images.
Dataset Splits	Yes	We randomly divide 70%, 20% and 10% of images respectively for train, test, and validation, respectively. KVQA dataset contains 17K, 5K and 2K images, and corresponding approximately 130K, 34K and 19K question-answer pairs in one split of train, validation and test, respectively.
Hardware Specification	No	The paper mentions training models but does not provide specific hardware details such as GPU/CPU models or memory used for experiments.
Software Dependencies	No	The paper mentions using tools like Facenet and Dexter, and a specific Wikidata dump date (05-05-2018), but does not provide specific version numbers for software dependencies such as programming languages or libraries used for implementation and training.
Experiment Setup	Yes	The cross-entropy loss between predicted and ground truth answers is used, and the loss is minimized using stochastic gradient descent. In order to utilize multi-hop facts more effectively, we stacked memory layers and reﬁne question representation qk+1 at each layer as sum of output representation at that layer and question representation at previous layer. Note that we stack three memory layers (K = 3) in our implementation. Further, the input and output embeddings are the same across different layers