Object Attribute Matters in Visual Question Answering

Authors: Peize Li, Qingyi Si, Peng Fu, Zheng Lin, Yan Wang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Intensive experiments on six datasets, COCO-QA, VQAv2, VQA-CPv2, VQA-CPv1, VQAvs and TDIUC, show the superiority of the proposed method.
Researcher Affiliation Academia 1School of Artificial Intelligence, Jilin University, Changchun, China 2Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 3School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China 4Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about the release of source code or a direct link to a code repository for the methodology described.
Open Datasets Yes Intensive experiments on six datasets, COCO-QA, VQAv2, VQA-CPv2, VQA-CPv1, VQAvs and TDIUC, show the superiority of the proposed method. For the detailed introduction to the datasets, please refer to Related Work. (Referencing Table 1 and citations in Related Work confirms public datasets). Example citation: "Agrawal, A.; Batra, D.; Parikh, D.; and Kembhavi, A. 2018. Don t just assume; look and answer: Overcoming priors for visual question answering. In CVPR." (for VQA-CPv1/2)
Dataset Splits Yes Dataset #QA pairs #Images Image Source COCO-QA 118K 123K COCO TDIUC 1.6M 167K COCO + VG VQA-CPv1 370K 205K COCO VQA-CPv2 603K 219K COCO VQAv2 1.1M 204K COCO VQAvs 658K 877K COCO (Table 1) and "VQAv2 val" in Table 4, indicating use of standard validation splits.
Hardware Specification No No specific hardware details (e.g., GPU models, CPU models, or memory specifications) used for running experiments are mentioned in the paper.
Software Dependencies No The paper mentions various models and optimizers but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes Our model is trained by Adam W optimizer with 100 epochs. The self-attention function Gatt(x) in the module consists of 5 layers of self-attention. In the cross-attention and self-attention layers, the hidden layer dimension is 512, and the number of heads is 8.