Exploring Question Decomposition for Zero-Shot VQA

Authors: Zaid Khan, Vijay Kumar B G, Samuel Schulter, Manmohan Chandraker, Yun Fu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments across three domains (art, natural images, medical), eight datasets, three model families, and model sizes ranging from 80M to 11B parameters.
Researcher Affiliation Collaboration Northeastern University 2NEC Laboratories America 3UC San Diego
Pseudocode Yes Figure 3: Pseudocode for selective decomposition.
Open Source Code No The paper provides a 'Project Site: https://zaidkhan.me/decomposition-0shot-vqa/' but does not explicitly state that the source code for the methodology is available there, nor does it provide a direct link to a code repository.
Open Datasets Yes The paper uses and cites several publicly available datasets including VQA-Introspect [32], A-OKVQA[35], Art VQA[36], OK-VQA[37], SLAKE[20], Path VQA[22], VQA Rad[21], and Winoground[23].
Dataset Splits Yes Validation split of VQA Introspect is the dataset (22k reasoning questions with their associated decompositions).
Hardware Specification Yes Experiments are run on a combination of A6000s and TPUv3s
Software Dependencies No The paper mentions using BLIP-2 and FLAN-T5 models, but does not provide specific version numbers for software dependencies such as programming languages, libraries (e.g., PyTorch, TensorFlow), or CUDA versions.
Experiment Setup Yes The paper describes the prompt structure used for in-context learning ('exemplar = "Context: is the sky blue? no. are there clouds in the sky? yes. Question: what weather is likely? Short answer: rain" prompt = exemplar + "Context: {subquestion }? {subanswer }. Question: { question }? Short answer:"') and introduces 'confidence threshold g' as an extra hyperparameter in the selective decomposition procedure.