reproducibilityindex.ai

A Novel Framework for Robustness Analysis of Visual QA Models

Authors: Jia-Hong Huang, Cuong Duc Dao, Modar Alfadly, Bernard Ghanem8449-8456

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we propose a flexible framework that focuses on the language part of VQA that uses semantically relevant questions, dubbed basic questions, acting as controllable noise to evaluate the robustness of VQA models. ... In this work, we propose a novel robustness measure Rscore and two large-scale basic question datasets (BQDs) in order to standardize robustness analysis for VQA models. To analyze our proposed framework, we will perform our experiments on six VQA models; LQI denoting LSTM Q+I (Antol et al. 2015), HAV denoting Hie Co Att (Alt, VGG19) and HAR denoting Hie Co Att (Alt, Resnet200) (Lu et al. 2016), MU denoting MUTAN without Attention and MUA denoting MUTAN with Attention (Ben-younes et al. 2017), and MLB denoting MLB with Attention (Antol et al. 2015).
Researcher Affiliation	Academia	Jia-Hong Huang,1,2 Cuong Duc Dao,1,* Modar Alfadly,1,* Bernard Ghanem1 1King Abdullah University of Science and Technology, 2National Taiwan University
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement about releasing the source code for the described methodology, nor does it include a direct link to a code repository.
Open Datasets	Yes	Our BQD is the combination of only unique questions from the training and validation sets of real images in the VQA dataset (Antol et al. 2015), which is a total of 186027 questions. In addition, these datasets contain 81434 testing images from the MS COCO dataset (Lin et al. 2014).
Dataset Splits	Yes	Our BQD is the combination of only unique questions from the training and validation sets of real images in the VQA dataset (Antol et al. 2015), which is a total of 186027 questions. On top of that, we will limit ourselves to the open-ended task on the test-dev partition from the 2017th VQA Challenge (Antol et al. 2015), denoted here as dev, unless otherwise specified like using the test-std partition, denoted std.
Hardware Specification	No	The paper states that it 'used the resources of the Supercomputing Laboratory at KAUST' but does not provide specific details on the hardware, such as GPU or CPU models, used for the experiments.
Software Dependencies	No	The paper mentions various models and metrics (e.g., Skipthought, LASSO, BLEU) but does not provide specific version numbers for any software dependencies or libraries used in the implementation.
Experiment Setup	Yes	We set λ = 10 6 and keep the top-k BQs for each MQ produced by solving Eq 1, where k = 21. In Tables 2 and 3, we compare the six VQA models on GBQD and YNBQD and report their Rscore on only partition 1 with t = 5 10 4 and m = 2 10 1.