Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Multi-Modal Answer Validation for Knowledge-Based VQA
Authors: Jialin Wu, Jiasen Lu, Ashish Sabharwal, Roozbeh Mottaghi2712-2721
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments with OK-VQA, a challenging knowledge-based VQA dataset, demonstrate that MAVEx achieves new state-of-the-art results. |
| Researcher Affiliation | Collaboration | Jialin Wu1, Jiasen Lu2, Ashish Sabharwal2, Roozbeh Mottaghi2 1 The University of Texas at Austin 2 Allen Institute for AI EMAIL, EMAIL |
| Pseudocode | No | The paper describes the framework steps but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/jialinwu17/MAVEX |
| Open Datasets | Yes | We evaluate MAVEx on OK-VQA (Marino et al. 2019), the largest knowledge-based VQA dataset to date. |
| Dataset Splits | Yes | We evaluate MAVEx on OK-VQA (Marino et al. 2019), the largest knowledge-based VQA dataset at present. The dataset contains 14,031 images and 14,055 questions... We use the finetuned model to extract the top 5 answers for each question in the training and test set. |
| Hardware Specification | Yes | We use Pytorch 1.4 on a single TITAN V GPU with 12M memory for each run, and it generally costs 22 hours to train a single model. |
| Software Dependencies | No | The paper mentions 'Pytorch 1.4' but does not provide version numbers for other significant software dependencies such as Allen NLP, T5 model, Mask R-CNN, or specific BERT/Tiny BERT implementations used. |
| Experiment Setup | Yes | We finetune the Vi LBERT-multi-task model on OK-VQA using the default configuration for 150 epochs for answer candidate generation... We train the system for 75 epochs using a learning rate of 2e-5 for the Vi LBERT parameters and 5e-5 for the additional parameters introduced in the validation module... The number of hidden units in the multi-head attention modules is set to 512. |