Exploring Question Decomposition for Zero-Shot VQA
Authors: Zaid Khan, Vijay Kumar B G, Samuel Schulter, Manmohan Chandraker, Yun Fu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments across three domains (art, natural images, medical), eight datasets, three model families, and model sizes ranging from 80M to 11B parameters. |
| Researcher Affiliation | Collaboration | Northeastern University 2NEC Laboratories America 3UC San Diego |
| Pseudocode | Yes | Figure 3: Pseudocode for selective decomposition. |
| Open Source Code | No | The paper provides a 'Project Site: https://zaidkhan.me/decomposition-0shot-vqa/' but does not explicitly state that the source code for the methodology is available there, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | The paper uses and cites several publicly available datasets including VQA-Introspect [32], A-OKVQA[35], Art VQA[36], OK-VQA[37], SLAKE[20], Path VQA[22], VQA Rad[21], and Winoground[23]. |
| Dataset Splits | Yes | Validation split of VQA Introspect is the dataset (22k reasoning questions with their associated decompositions). |
| Hardware Specification | Yes | Experiments are run on a combination of A6000s and TPUv3s |
| Software Dependencies | No | The paper mentions using BLIP-2 and FLAN-T5 models, but does not provide specific version numbers for software dependencies such as programming languages, libraries (e.g., PyTorch, TensorFlow), or CUDA versions. |
| Experiment Setup | Yes | The paper describes the prompt structure used for in-context learning ('exemplar = "Context: is the sky blue? no. are there clouds in the sky? yes. Question: what weather is likely? Short answer: rain" prompt = exemplar + "Context: {subquestion }? {subanswer }. Question: { question }? Short answer:"') and introduces 'confidence threshold g' as an extra hyperparameter in the selective decomposition procedure. |