Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Object Attribute Matters in Visual Question Answering
Authors: Peize Li, Qingyi Si, Peng Fu, Zheng Lin, Yan Wang
AAAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Intensive experiments on six datasets, COCO-QA, VQAv2, VQA-CPv2, VQA-CPv1, VQAvs and TDIUC, show the superiority of the proposed method. |
| Researcher Affiliation | Academia | 1School of Artificial Intelligence, Jilin University, Changchun, China 2Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China 3School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China 4Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code or a direct link to a code repository for the methodology described. |
| Open Datasets | Yes | Intensive experiments on six datasets, COCO-QA, VQAv2, VQA-CPv2, VQA-CPv1, VQAvs and TDIUC, show the superiority of the proposed method. For the detailed introduction to the datasets, please refer to Related Work. (Referencing Table 1 and citations in Related Work confirms public datasets). Example citation: "Agrawal, A.; Batra, D.; Parikh, D.; and Kembhavi, A. 2018. Don t just assume; look and answer: Overcoming priors for visual question answering. In CVPR." (for VQA-CPv1/2) |
| Dataset Splits | Yes | Dataset #QA pairs #Images Image Source COCO-QA 118K 123K COCO TDIUC 1.6M 167K COCO + VG VQA-CPv1 370K 205K COCO VQA-CPv2 603K 219K COCO VQAv2 1.1M 204K COCO VQAvs 658K 877K COCO (Table 1) and "VQAv2 val" in Table 4, indicating use of standard validation splits. |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU models, or memory specifications) used for running experiments are mentioned in the paper. |
| Software Dependencies | No | The paper mentions various models and optimizers but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Our model is trained by Adam W optimizer with 100 epochs. The self-attention function Gatt(x) in the module consists of 5 layers of self-attention. In the cross-attention and self-attention layers, the hidden layer dimension is 512, and the number of heads is 8. |