Self-Critical Reasoning for Robust Visual Question Answering
Authors: Jialin Wu, Raymond Mooney
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on the VQA generalization task using the VQA-CP dataset, achieving a new state-of-the-art i.e., 49.5% using textual explanations and 48.5% using automatically annotated regions. |
| Researcher Affiliation | Academia | Jialin Wu Department of Computer Science University of Texas at Austin jialinwu@utexas.edu Raymond J. Mooney Department of Computer Science University of Texas at Austin mooney@cs.utexas.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/jialinwu17/Self_Critical_VQA. |
| Open Datasets | Yes | We evaluate our approach on the VQA generalization task using the VQA-CP dataset... We also report our system s performance on the balanced VQA v2 validation set for completeness. The Expl. column shows the source of explanations for training the VQA systems. |
| Dataset Splits | Yes | We first pre-train our base Up Dn VQA system on the VQA-CP training set using standard VQA loss Lvqa (binary cross-entropy loss with soft scores as supervision) with the Adam optimizer [16] for at most 20 epochs... Then, we fine-tune our system to recognize important objects using Lvqa + λinfl Linfl with a learning rate of 10e-5 for at most 15 epochs on the intersection of VQA-X and VQA-CP training set. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running its experiments. |
| Software Dependencies | No | The paper mentions software tools and libraries like Adam optimizer, GRU, Glove embeddings, Faster R-CNN, ResNet-101, and spaCy POS tagger, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | We first pre-train our base Up Dn VQA system on the VQA-CP training set using standard VQA loss Lvqa (binary cross-entropy loss with soft scores as supervision) with the Adam optimizer [16] for at most 20 epochs. As suggested in [27], the learning rate is fixed to 10e-3 with a batch size of 384 during the pre-training process, and we use 1, 280 hidden units in the base Up Dn VQA system. Then, we fine-tune our system to recognize important objects using Lvqa + λinfl Linfl with a learning rate of 10e-5 for at most 15 epochs... Finally, we fine-tune the system with the joint loss L = Lvqa + λ infl Linfl + λcrit Lcrit for at most 15 epochs with a learning rate of 10e-5... The bucket size |B| of the competitive answers is set to 5... |