Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering

Authors: Abhirama Subramanyam Penamakuri, Manish Gupta, Mithun Das Gupta, Anand Mishra

IJCAI 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our proposed framework achieves an accuracy of 76.5% and a fluency of 79.3% on the proposed dataset, namely RETVQA and also outperforms state-of-the-art methods by 4.9% and 11.8% on the image segment of the publicly available Web QA dataset on the accuracy and fluency metrics, respectively.
Researcher Affiliation Collaboration Abhirama Subramanyam Penamakuri1 , Manish Gupta2 , Mithun Das Gupta2 , Anand Mishra1 1Indian Institute of Technology Jodhpur 2Microsoft
Pseudocode No The paper includes a system overview diagram (Figure 4) but no explicit pseudocode or algorithm blocks.
Open Source Code Yes We make our data and implementation publicly available.1 1https://vl2g.github.io/projects/retvqa/
Open Datasets Yes To this end, we present a derived dataset prepared from Visual Genome [Krishna et al., 2017], leveraging its questions and annotations of images. ... We make our data and implementation publicly available.1
Dataset Splits Yes Train set questions 334K (80%) Val set questions 41K (10%) Test set questions 41K (10%)
Hardware Specification Yes Our relevance encoder and MI-BART were trained using 3 Nvidia RTX A6000 GPUs with a batch size of 96 and 256 while training and a batch size of 360 and 480 during testing, respectively.
Software Dependencies No We have implemented our framework in Py Torch [Paszke et al., 2019] and Hugging Face s transformers [Wolf et al., 2020] library. While these libraries are mentioned with their publication years, specific version numbers (e.g., PyTorch 1.9, transformers 4.0) are not provided.
Experiment Setup Yes We pretrain our relevance encoder on MS-COCO [Lin et al., 2014] with a constant learning rate of 1e-4 using Adam optimizer [Kingma and Ba, 2015]. Using the same optimiser, we finetune the relevance encoder on both datasets with a constant learning rate of 2e-5. ...we further finetune MI-BART on a multi-image QA task with a learning rate of 5e-5 using Adam optimizer with a linear warm-up of 10% of the total steps.