Multi-Modal Answer Validation for Knowledge-Based VQA

Authors: Jialin Wu, Jiasen Lu, Ashish Sabharwal, Roozbeh Mottaghi2712-2721

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments with OK-VQA, a challenging knowledge-based VQA dataset, demonstrate that MAVEx achieves new state-of-the-art results.
Researcher Affiliation Collaboration Jialin Wu1, Jiasen Lu2, Ashish Sabharwal2, Roozbeh Mottaghi2 1 The University of Texas at Austin 2 Allen Institute for AI jialinwu@utexas.edu, {jiasenl, ashishs, roozbehm}@allenai.org
Pseudocode No The paper describes the framework steps but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Our code is available at https://github.com/jialinwu17/MAVEX
Open Datasets Yes We evaluate MAVEx on OK-VQA (Marino et al. 2019), the largest knowledge-based VQA dataset to date.
Dataset Splits Yes We evaluate MAVEx on OK-VQA (Marino et al. 2019), the largest knowledge-based VQA dataset at present. The dataset contains 14,031 images and 14,055 questions... We use the finetuned model to extract the top 5 answers for each question in the training and test set.
Hardware Specification Yes We use Pytorch 1.4 on a single TITAN V GPU with 12M memory for each run, and it generally costs 22 hours to train a single model.
Software Dependencies No The paper mentions 'Pytorch 1.4' but does not provide version numbers for other significant software dependencies such as Allen NLP, T5 model, Mask R-CNN, or specific BERT/Tiny BERT implementations used.
Experiment Setup Yes We finetune the Vi LBERT-multi-task model on OK-VQA using the default configuration for 150 epochs for answer candidate generation... We train the system for 75 epochs using a learning rate of 2e-5 for the Vi LBERT parameters and 5e-5 for the additional parameters introduced in the validation module... The number of hidden units in the multi-head attention modules is set to 512.