BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining

Authors: MinJun Kim, SeungWoo Song, YouHan Lee, Haneol Jang, KyungTae Lim

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, through in-depth analysis, we demonstrated the actual effect of the knowledge information contained in the constructed training data on VQA.
Researcher Affiliation Collaboration 1 Hanbat National University 2 Kakao Brain 3 Seoul National University of Science and Technology
Pseudocode No The paper includes equations and architectural diagrams, but no explicitly labeled "Pseudocode" or "Algorithm" block.
Open Source Code No The paper does not provide a direct link to its source code or explicitly state that the code for the described methodology is publicly available.
Open Datasets Yes This research used datasets from The Open AI Dataset Project (AI-Hub) (No. 2022-데이터-위41, 2023-지능데이터-위93). In addition, considering the usage frequency, we incorporated 1,079 objects derived from Image Net (Deng et al. 2009) and supplemented 32 additional relations. We assembled 282,533 triple knowledge entries comprising 1,579 objects and 42 relations from English Concept Net and DBpedia.
Dataset Splits Yes The dataset was split into 60% for training and 20% each for validation and testing, with five-fold validation.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions models like XLM-RoBERTa and ResNet50, but does not provide specific version numbers for underlying software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used for implementation.
Experiment Setup Yes all experiments were pretrained on English KB data using the Conv KB algorithm for 50,000 iterations.