BERT & Family Eat Word Salad: Experiments with Text Understanding

Authors: Ashim Gupta, Giorgi Kvernadze, Vivek Srikumar12946-12954

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that state-of-the-art models consistently fail to recognize them as ill-formed, and instead produce high confidence predictions on them. We apply the destructive transformation functions described earlier to each task s validation set.
Researcher Affiliation Academia Ashim Gupta, Giorgi Kvernadze, Vivek Srikumar University of Utah {ashim, giorgi, svivek}@cs.utah.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes 1Our code is available at https://github.com/utahnlp/word-salad
Open Datasets Yes We use the MNLI (Williams, Nangia, and Bowman 2018) and SNLI (Bowman et al. 2015) datasets. For this task, we use the Microsoft Research Paraphrase Corpus (MRPC, Dolan and Brockett 2005), and Quora Question Pair (QQP) dataset2. We use the Stanford Sentiment Treebank (SST-2, Socher et al. 2013).
Dataset Splits Yes All three introduce new hyperparameters, which are tuned on a validation set constructed by sampling 10% of the training set. The final models are then trained on the full training set.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only general mentions of models like 'RoBERTa'.
Software Dependencies No The paper mentions 'Moses SMT toolkit (Koehn et al. 2007)' but does not provide specific version numbers for any software dependencies required to replicate the experiments (e.g., programming language, libraries, or frameworks).
Experiment Setup Yes We use the base variant of Ro BERTa that is fine-tuned for three epochs across all our experiments, using hyperparameters suggested by the original paper.