A Novel Framework for Robustness Analysis of Visual QA Models

Authors: Jia-Hong Huang, Cuong Duc Dao, Modar Alfadly, Bernard Ghanem8449-8456

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we propose a flexible framework that focuses on the language part of VQA that uses semantically relevant questions, dubbed basic questions, acting as controllable noise to evaluate the robustness of VQA models. ... In this work, we propose a novel robustness measure Rscore and two large-scale basic question datasets (BQDs) in order to standardize robustness analysis for VQA models. To analyze our proposed framework, we will perform our experiments on six VQA models; LQI denoting LSTM Q+I (Antol et al. 2015), HAV denoting Hie Co Att (Alt, VGG19) and HAR denoting Hie Co Att (Alt, Resnet200) (Lu et al. 2016), MU denoting MUTAN without Attention and MUA denoting MUTAN with Attention (Ben-younes et al. 2017), and MLB denoting MLB with Attention (Antol et al. 2015).
Researcher Affiliation Academia Jia-Hong Huang,1,2 Cuong Duc Dao,1,* Modar Alfadly,1,* Bernard Ghanem1 1King Abdullah University of Science and Technology, 2National Taiwan University
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing the source code for the described methodology, nor does it include a direct link to a code repository.
Open Datasets Yes Our BQD is the combination of only unique questions from the training and validation sets of real images in the VQA dataset (Antol et al. 2015), which is a total of 186027 questions. In addition, these datasets contain 81434 testing images from the MS COCO dataset (Lin et al. 2014).
Dataset Splits Yes Our BQD is the combination of only unique questions from the training and validation sets of real images in the VQA dataset (Antol et al. 2015), which is a total of 186027 questions. On top of that, we will limit ourselves to the open-ended task on the test-dev partition from the 2017th VQA Challenge (Antol et al. 2015), denoted here as dev, unless otherwise specified like using the test-std partition, denoted std.
Hardware Specification No The paper states that it 'used the resources of the Supercomputing Laboratory at KAUST' but does not provide specific details on the hardware, such as GPU or CPU models, used for the experiments.
Software Dependencies No The paper mentions various models and metrics (e.g., Skipthought, LASSO, BLEU) but does not provide specific version numbers for any software dependencies or libraries used in the implementation.
Experiment Setup Yes We set λ = 10 6 and keep the top-k BQs for each MQ produced by solving Eq 1, where k = 21. In Tables 2 and 3, we compare the six VQA models on GBQD and YNBQD and report their Rscore on only partition 1 with t = 5 10 4 and m = 2 10 1.