Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Authors: Sainandan Ramakrishnan, Aishwarya Agrawal, Stefan Lee

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show empirically that it can improve performance significantly on a bias-sensitive split of the VQA dataset for multiple base models achieving state-of-the-art on this task.
Researcher Affiliation Academia Sainandan Ramakrishnan Aishwarya Agrawal Stefan Lee Georgia Institute of Technology {sainandancv, aishwarya, steflee}@gatech.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions using public codebases for base models (e.g., "SAN Codebase: https://github.com/abhshkdz/neural-vqa-attention"), but does not provide a clear statement or link for the open-sourcing of their own proposed methodology's code.
Open Datasets Yes We experiment on the VQA-CP dataset [2] with multiple base VQA models, and find 1) our approach provides consistent improvements over all baseline VQA models... We train our models on the VQA-CP [2] train split and evaluate on the test set using the standard VQA evaluation metric [6].
Dataset Splits Yes We train our models on the VQA-CP [2] train split and evaluate on the test set using the standard VQA evaluation metric [6]. For each model, we also report results when trained and evaluated on the standard VQA train and validation splits [6, 12] with the same regularization coefficients used for VQA-CP to compare with [2].
Hardware Specification Yes The model takes 8 hours to train on a TITAN X for SAN (Torch, 60 epochs) and < 1 hour for Up Down (Py Torch, 40 epochs).
Software Dependencies No The paper mentions "Torch" and "Py Torch" as frameworks used for training but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes We set batch size to 150, learning rate to 0.001, weight decay of 0.999 and use the Adam optimizer.