Detection-Based Intermediate Supervision for Visual Question Answering
Authors: Yuhang Liu, Daowan Peng, Wei Wei, Yuanyuan Fu, Wenfeng Xie, Dangyang Chen
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the superiority of our proposed DIS, showcasing both improved accuracy and state-of-the-art reasoning consistency compared to prior approaches. and "Experiments" section. |
| Researcher Affiliation | Collaboration | 1 CCIIP Lab, School of Computer Science and Technology, Huazhong University of Science and Technology 2Joint Laboratory of HUST and Pingan Property & Casualty Research (HPL) 3Ping An Property & Casualty Insurance Company of China, Ltd. 4Byte Dance Inc. |
| Pseudocode | No | The paper describes methods and processes through text and figures, but does not contain a formally labeled "Pseudocode" or "Algorithm" block. |
| Open Source Code | Yes | The code is available at https://github.com/CCIIPLab/DIS. |
| Open Datasets | Yes | To evaluate the answer prediction performance and answering consistency, the reported results in the following sections are evaluated on the widely used GQA (Hudson and Manning 2019b) dataset, and its variant GQA-Sub (Jing et al. 2022b). |
| Dataset Splits | Yes | GQA-Sub (Jing et al. 2022b) is derived from the well-organized GQA dataset, and creates sub-questions for train and val splits, thereby enabling quantitative evaluation of reasoning consistency. and All the models are trained on train+val split, and evaluated on testdev split. |
| Hardware Specification | No | The paper does not specify any hardware details such as specific GPU models, CPU models, or cloud computing instance types used for running experiments. |
| Software Dependencies | No | T5-base is utilized for text transformation... We exploit Adam W optimizer... The paper mentions specific models and optimizers but does not provide specific version numbers for software dependencies such as Python, PyTorch, or T5-base itself. |
| Experiment Setup | Yes | Following the settings from MMN (Chen et al. 2021), the questions are truncated or padding to the fixed length of 32. The number of nodes in each program tree is limited to 9, and the maximum length k of each program is set to 8. and We exploit Adam W optimizer with learning of 1e 4 and batch size of 32 to finetune T5 for 400k steps. |