A Case Study of the Shortcut Effects in Visual Commonsense Reasoning
Authors: Keren Ye, Adriana Kovashka3181-3189
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We first quantify the impact of shortcuts on state-of-the-art models. We propose two methods to augment VCR evaluation. We show the performance of SOTA methods drops significantly on the modified evaluation data. We qualitatively demonstrate, then quantitatively measure, the effect of shortcuts through our modified evaluations, on four recent and competitive VCR methods. |
| Researcher Affiliation | Academia | Keren Ye, Adriana Kovashka University of Pittsburgh, Pittsburgh PA 15260, USA {yekeren,kovashka}@cs.pitt.edu |
| Pseudocode | No | The paper describes methods in text and uses a diagram (Figure 2), but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code and data are available at https://github.com/yekeren/VCR-shortcut-effects-study. |
| Open Datasets | Yes | All models that we train, including baselines, use BERT-Base (12 layers, 768 hidden units) and ResNet-101 (He et al. 2016) pre-trained on Image Net (Deng et al. 2009), as the language and vision models backbones, respectively. We discover a new type of bias in the Visual Commonsense Reasoning (VCR) dataset. |
| Dataset Splits | Yes | We train for 50k steps (roughly 11 epochs) on the 212,923 training examples and save the model performing best on the validation set (26,534 samples), for each method in Table 5. |
| Hardware Specification | Yes | We use 4 GTX1080 GPUs, batch size of 48 (12 images per GPU), learning rate of 1e-5, ADAM optimizer, and the Tensorflow framework. |
| Software Dependencies | Yes | All models that we train, including baselines, use BERT-Base (12 layers, 768 hidden units) and ResNet-101 (He et al. 2016) pre-trained on Image Net (Deng et al. 2009), as the language and vision models backbones, respectively. We use 4 GTX1080 GPUs, batch size of 48 (12 images per GPU), learning rate of 1e-5, ADAM optimizer, and the Tensorflow framework. |
| Experiment Setup | Yes | We use 4 GTX1080 GPUs, batch size of 48 (12 images per GPU), learning rate of 1e-5, ADAM optimizer, and the Tensorflow framework. We train for 50k steps (roughly 11 epochs) on the 212,923 training examples and save the model performing best on the validation set (26,534 samples), for each method in Table 5. |