Object Attribute Matters in Visual Question Answering
Authors: Peize Li, Qingyi Si, Peng Fu, Zheng Lin, Yan Wang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Intensive experiments on six datasets, COCO-QA, VQAv2, VQA-CPv2, VQA-CPv1, VQAvs and TDIUC, show the superiority of the proposed method. |
| Researcher Affiliation | Academia | 1School of Artificial Intelligence, Jilin University, Changchun, China 2Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 3School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China 4Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code or a direct link to a code repository for the methodology described. |
| Open Datasets | Yes | Intensive experiments on six datasets, COCO-QA, VQAv2, VQA-CPv2, VQA-CPv1, VQAvs and TDIUC, show the superiority of the proposed method. For the detailed introduction to the datasets, please refer to Related Work. (Referencing Table 1 and citations in Related Work confirms public datasets). Example citation: "Agrawal, A.; Batra, D.; Parikh, D.; and Kembhavi, A. 2018. Don t just assume; look and answer: Overcoming priors for visual question answering. In CVPR." (for VQA-CPv1/2) |
| Dataset Splits | Yes | Dataset #QA pairs #Images Image Source COCO-QA 118K 123K COCO TDIUC 1.6M 167K COCO + VG VQA-CPv1 370K 205K COCO VQA-CPv2 603K 219K COCO VQAv2 1.1M 204K COCO VQAvs 658K 877K COCO (Table 1) and "VQAv2 val" in Table 4, indicating use of standard validation splits. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU models, or memory specifications) used for running experiments are mentioned in the paper. |
| Software Dependencies | No | The paper mentions various models and optimizers but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Our model is trained by Adam W optimizer with 100 epochs. The self-attention function Gatt(x) in the module consists of 5 layers of self-attention. In the cross-attention and self-attention layers, the hidden layer dimension is 512, and the number of heads is 8. |