KVQA: Knowledge-Aware Visual Question Answering
Authors: Sanket Shah, Anand Mishra, Naganand Yadati, Partha Pratim Talukdar8876-8884
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Further, we also provide baseline performances using state-of-the-art methods on KVQA. We also provide performances of state-of-the-art methods when applied over the KVQA dataset. |
| Researcher Affiliation | Academia | Sanket Shah,1* Anand Mishra,2* Naganand Yadati,2 Partha Pratim Talukdar2 1IIIT Hyderabad, India, 2Indian Institute of Science, Bangalore, India |
| Pseudocode | No | The paper describes methods in text and uses a diagram (Figure 5) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | Our dataset KVQA can be viewed and downloaded from our project website: http://malllabiisc.github.io/resources/kvqa/. It contains 183K question-answer pairs about more than 18K persons and 24K images. All these preprocessed data and reference images are publicly available in our project website. This refers to data, not the methodology's code. |
| Open Datasets | Yes | Our dataset KVQA can be viewed and downloaded from our project website: http://malllabiisc.github.io/resources/kvqa/. It contains 183K question-answer pairs about more than 18K persons and 24K images. |
| Dataset Splits | Yes | We randomly divide 70%, 20% and 10% of images respectively for train, test, and validation, respectively. KVQA dataset contains 17K, 5K and 2K images, and corresponding approximately 130K, 34K and 19K question-answer pairs in one split of train, validation and test, respectively. |
| Hardware Specification | No | The paper mentions training models but does not provide specific hardware details such as GPU/CPU models or memory used for experiments. |
| Software Dependencies | No | The paper mentions using tools like Facenet and Dexter, and a specific Wikidata dump date (05-05-2018), but does not provide specific version numbers for software dependencies such as programming languages or libraries used for implementation and training. |
| Experiment Setup | Yes | The cross-entropy loss between predicted and ground truth answers is used, and the loss is minimized using stochastic gradient descent. In order to utilize multi-hop facts more effectively, we stacked memory layers and refine question representation qk+1 at each layer as sum of output representation at that layer and question representation at previous layer. Note that we stack three memory layers (K = 3) in our implementation. Further, the input and output embeddings are the same across different layers |