Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Self-Critical Reasoning for Robust Visual Question Answering
Authors: Jialin Wu, Raymond Mooney
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on the VQA generalization task using the VQA-CP dataset, achieving a new state-of-the-art i.e., 49.5% using textual explanations and 48.5% using automatically annotated regions. |
| Researcher Affiliation | Academia | Jialin Wu Department of Computer Science University of Texas at Austin EMAIL Raymond J. Mooney Department of Computer Science University of Texas at Austin EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/jialinwu17/Self_Critical_VQA. |
| Open Datasets | Yes | We evaluate our approach on the VQA generalization task using the VQA-CP dataset... We also report our system s performance on the balanced VQA v2 validation set for completeness. The Expl. column shows the source of explanations for training the VQA systems. |
| Dataset Splits | Yes | We first pre-train our base Up Dn VQA system on the VQA-CP training set using standard VQA loss Lvqa (binary cross-entropy loss with soft scores as supervision) with the Adam optimizer [16] for at most 20 epochs... Then, we fine-tune our system to recognize important objects using Lvqa + λinfl Linfl with a learning rate of 10e-5 for at most 15 epochs on the intersection of VQA-X and VQA-CP training set. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running its experiments. |
| Software Dependencies | No | The paper mentions software tools and libraries like Adam optimizer, GRU, Glove embeddings, Faster R-CNN, ResNet-101, and spaCy POS tagger, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | We first pre-train our base Up Dn VQA system on the VQA-CP training set using standard VQA loss Lvqa (binary cross-entropy loss with soft scores as supervision) with the Adam optimizer [16] for at most 20 epochs. As suggested in [27], the learning rate is fixed to 10e-3 with a batch size of 384 during the pre-training process, and we use 1, 280 hidden units in the base Up Dn VQA system. Then, we fine-tune our system to recognize important objects using Lvqa + λinfl Linfl with a learning rate of 10e-5 for at most 15 epochs... Finally, we fine-tune the system with the joint loss L = Lvqa + λ infl Linfl + λcrit Lcrit for at most 15 epochs with a learning rate of 10e-5... The bucket size |B| of the competitive answers is set to 5... |