reproducibilityindex.ai

BERT & Family Eat Word Salad: Experiments with Text Understanding

Authors: Ashim Gupta, Giorgi Kvernadze, Vivek Srikumar12946-12954

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that state-of-the-art models consistently fail to recognize them as ill-formed, and instead produce high conﬁdence predictions on them. We apply the destructive transformation functions described earlier to each task s validation set.
Researcher Affiliation	Academia	Ashim Gupta, Giorgi Kvernadze, Vivek Srikumar University of Utah {ashim, giorgi, svivek}@cs.utah.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	1Our code is available at https://github.com/utahnlp/word-salad
Open Datasets	Yes	We use the MNLI (Williams, Nangia, and Bowman 2018) and SNLI (Bowman et al. 2015) datasets. For this task, we use the Microsoft Research Paraphrase Corpus (MRPC, Dolan and Brockett 2005), and Quora Question Pair (QQP) dataset2. We use the Stanford Sentiment Treebank (SST-2, Socher et al. 2013).
Dataset Splits	Yes	All three introduce new hyperparameters, which are tuned on a validation set constructed by sampling 10% of the training set. The ﬁnal models are then trained on the full training set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only general mentions of models like 'RoBERTa'.
Software Dependencies	No	The paper mentions 'Moses SMT toolkit (Koehn et al. 2007)' but does not provide specific version numbers for any software dependencies required to replicate the experiments (e.g., programming language, libraries, or frameworks).
Experiment Setup	Yes	We use the base variant of Ro BERTa that is ﬁne-tuned for three epochs across all our experiments, using hyperparameters suggested by the original paper.