Declaration-based Prompt Tuning for Visual Question Answering

Authors: Yuhang Liu, Wei Wei, Daowan Peng, Feida Zhu

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on GQA dataset show that DPT outperforms the fine-tuned counterpart by a large margin regarding accuracy in both fully-supervised (2.68%) and zero-shot/few-shot (over 31%) settings.
Researcher Affiliation Collaboration Yuhang Liu1,2 , Wei Wei1,2 , Daowan Peng1,2 and Feida Zhu3 1Cognitive Computing and Intelligent Information Processing (CCIIP) Laboratory, School of Computer Science and Technology, Huazhong University of Science and Technology, China 2Joint Laboratory of HUST and Pingan Property & Casualty Research (HPL), China 3School of Computing and Information Systems, Singapore Management University, Singapore
Pseudocode No The paper describes methods through textual descriptions and equations (e.g., Equation 1-14) and includes a framework diagram (Figure 2), but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps formatted like code.
Open Source Code No All the data and codes will be available to facilitate future research.
Open Datasets Yes Datasets. GQA [Hudson and Manning, 2019a] and VQA v2.0 [Agrawal et al., 2015] are used to build declaration generation dataset and evaluate our proposed methods on VQA task. More details are provided in the Appendix.
Dataset Splits Yes For a deeper understanding of DPT, we further conduct the ablation studies on the local validation split of GQA and VQA v2.0 datasets (textdev on GQA and val on VQA v2.0).
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as CPU or GPU models, memory, or other system specifications.
Software Dependencies No The paper mentions models and architectures used (e.g., 'T5-small', 'Vin VL'), but it does not specify versions for general software dependencies or libraries such as Python, PyTorch, TensorFlow, or CUDA.
Experiment Setup Yes The number of answers used for ITM K is set to 8. For fair comparison, we follow the same training settings as reported in the previous works in the following experiments. The details of hyper-parameters are reported in Appendix.