reproducibilityindex.ai

REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

Authors: Yuanze Lin, Yujia Xie, Dongdong Chen, Yichong Xu, Chenguang Zhu, Lu Yuan

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform extensive experiments on the standard OK-VQA dataset and achieve new state-of-the-art performance, i.e., 58.0% accuracy, surpassing previous state-of-the-art method by a large margin (+3.6%). We also conduct detailed analysis and show the necessity of regional information in different framework components for knowledge-based VQA.
Researcher Affiliation	Collaboration	University of Washington Microsoft yuanze@uw.edu {yujiaxie, dochen, yicxu}@microsoft.com
Pseudocode	No	The paper describes the method using equations and text, but does not include explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code is publicly available at https://github.com/yzleroy/REVIVE.
Open Datasets	Yes	OK-VQA dataset [22] is selected for evaluation, which is currently the largest knowledgebased VQA dataset.
Dataset Splits	No	The paper states 'The training and testing split consist of 9009 and 5046 samples respectively' but does not explicitly mention a validation split or its size.
Hardware Specification	Yes	We use 4 NVIDIA V100 32Gb to train models for 10K steps, with a batch size of 8.
Software Dependencies	No	The paper mentions specific pre-trained models like 'GLIP-T', 'Vinvl-Large', 'CLIP model (Vi T-B/16 variant)', 'T5 model', and 'GPT-3', but does not provide specific version numbers for the underlying software libraries or environments (e.g., PyTorch version, Python version).
Experiment Setup	Yes	We use 4 NVIDIA V100 32Gb to train models for 10K steps, with a batch size of 8. The learning rate is 8e 5 and Adam W [19] is chosen as optimizer. The warm-up steps are 1K and the trained models are evaluated every 500 steps.