A Case Study of the Shortcut Effects in Visual Commonsense Reasoning

Authors: Keren Ye, Adriana Kovashka3181-3189

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first quantify the impact of shortcuts on state-of-the-art models. We propose two methods to augment VCR evaluation. We show the performance of SOTA methods drops significantly on the modified evaluation data. We qualitatively demonstrate, then quantitatively measure, the effect of shortcuts through our modified evaluations, on four recent and competitive VCR methods.
Researcher Affiliation Academia Keren Ye, Adriana Kovashka University of Pittsburgh, Pittsburgh PA 15260, USA {yekeren,kovashka}@cs.pitt.edu
Pseudocode No The paper describes methods in text and uses a diagram (Figure 2), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code and data are available at https://github.com/yekeren/VCR-shortcut-effects-study.
Open Datasets Yes All models that we train, including baselines, use BERT-Base (12 layers, 768 hidden units) and ResNet-101 (He et al. 2016) pre-trained on Image Net (Deng et al. 2009), as the language and vision models backbones, respectively. We discover a new type of bias in the Visual Commonsense Reasoning (VCR) dataset.
Dataset Splits Yes We train for 50k steps (roughly 11 epochs) on the 212,923 training examples and save the model performing best on the validation set (26,534 samples), for each method in Table 5.
Hardware Specification Yes We use 4 GTX1080 GPUs, batch size of 48 (12 images per GPU), learning rate of 1e-5, ADAM optimizer, and the Tensorflow framework.
Software Dependencies Yes All models that we train, including baselines, use BERT-Base (12 layers, 768 hidden units) and ResNet-101 (He et al. 2016) pre-trained on Image Net (Deng et al. 2009), as the language and vision models backbones, respectively. We use 4 GTX1080 GPUs, batch size of 48 (12 images per GPU), learning rate of 1e-5, ADAM optimizer, and the Tensorflow framework.
Experiment Setup Yes We use 4 GTX1080 GPUs, batch size of 48 (12 images per GPU), learning rate of 1e-5, ADAM optimizer, and the Tensorflow framework. We train for 50k steps (roughly 11 epochs) on the 212,923 training examples and save the model performing best on the validation set (26,534 samples), for each method in Table 5.