reproducibilityindex.ai

A Case Study of the Shortcut Effects in Visual Commonsense Reasoning

Authors: Keren Ye, Adriana Kovashka3181-3189

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We first quantify the impact of shortcuts on state-of-the-art models. We propose two methods to augment VCR evaluation. We show the performance of SOTA methods drops signiﬁcantly on the modiﬁed evaluation data. We qualitatively demonstrate, then quantitatively measure, the effect of shortcuts through our modiﬁed evaluations, on four recent and competitive VCR methods.
Researcher Affiliation	Academia	Keren Ye, Adriana Kovashka University of Pittsburgh, Pittsburgh PA 15260, USA {yekeren,kovashka}@cs.pitt.edu
Pseudocode	No	The paper describes methods in text and uses a diagram (Figure 2), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code and data are available at https://github.com/yekeren/VCR-shortcut-effects-study.
Open Datasets	Yes	All models that we train, including baselines, use BERT-Base (12 layers, 768 hidden units) and ResNet-101 (He et al. 2016) pre-trained on Image Net (Deng et al. 2009), as the language and vision models backbones, respectively. We discover a new type of bias in the Visual Commonsense Reasoning (VCR) dataset.
Dataset Splits	Yes	We train for 50k steps (roughly 11 epochs) on the 212,923 training examples and save the model performing best on the validation set (26,534 samples), for each method in Table 5.
Hardware Specification	Yes	We use 4 GTX1080 GPUs, batch size of 48 (12 images per GPU), learning rate of 1e-5, ADAM optimizer, and the Tensorflow framework.
Software Dependencies	Yes	All models that we train, including baselines, use BERT-Base (12 layers, 768 hidden units) and ResNet-101 (He et al. 2016) pre-trained on Image Net (Deng et al. 2009), as the language and vision models backbones, respectively. We use 4 GTX1080 GPUs, batch size of 48 (12 images per GPU), learning rate of 1e-5, ADAM optimizer, and the Tensorflow framework.
Experiment Setup	Yes	We use 4 GTX1080 GPUs, batch size of 48 (12 images per GPU), learning rate of 1e-5, ADAM optimizer, and the Tensorflow framework. We train for 50k steps (roughly 11 epochs) on the 212,923 training examples and save the model performing best on the validation set (26,534 samples), for each method in Table 5.