Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Overcoming Language Priors in Visual Question Answering with Adversarial Regularization
Authors: Sainandan Ramakrishnan, Aishwarya Agrawal, Stefan Lee
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically that it can improve performance significantly on a bias-sensitive split of the VQA dataset for multiple base models achieving state-of-the-art on this task. |
| Researcher Affiliation | Academia | Sainandan Ramakrishnan Aishwarya Agrawal Stefan Lee Georgia Institute of Technology EMAIL |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using public codebases for base models (e.g., "SAN Codebase: https://github.com/abhshkdz/neural-vqa-attention"), but does not provide a clear statement or link for the open-sourcing of their own proposed methodology's code. |
| Open Datasets | Yes | We experiment on the VQA-CP dataset [2] with multiple base VQA models, and find 1) our approach provides consistent improvements over all baseline VQA models... We train our models on the VQA-CP [2] train split and evaluate on the test set using the standard VQA evaluation metric [6]. |
| Dataset Splits | Yes | We train our models on the VQA-CP [2] train split and evaluate on the test set using the standard VQA evaluation metric [6]. For each model, we also report results when trained and evaluated on the standard VQA train and validation splits [6, 12] with the same regularization coefficients used for VQA-CP to compare with [2]. |
| Hardware Specification | Yes | The model takes 8 hours to train on a TITAN X for SAN (Torch, 60 epochs) and < 1 hour for Up Down (Py Torch, 40 epochs). |
| Software Dependencies | No | The paper mentions "Torch" and "Py Torch" as frameworks used for training but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We set batch size to 150, learning rate to 0.001, weight decay of 0.999 and use the Adam optimizer. |