reproducibilityindex.ai

e-SNLI: Natural Language Inference with Natural Language Explanations

Authors: Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, Phil Blunsom

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In order to demonstrate the efﬁcacy of the e-SNLI dataset, we ﬁrst show that it is much more difﬁcult to produce correct explanations based on spurious correlations than to produce correct labels. We then implement models that, given a premise and a hypothesis, predict a label and an explanation. We also investigate how the additional signal from explanations received at train time can guide models into learning better sentence representations. Finally, we look into the transfer capabilities of our model to out-of-domain NLI datasets. Throughout our experiments, our models follow the architecture presented in Conneau et al. [7], as we build directly on top of their code2.
Researcher Affiliation	Collaboration	Oana-Maria Camburu1 Tim Rocktäschel2 Thomas Lukasiewicz1,3 Phil Blunsom1,4 {oana-maria.camburu, thomas.lukasiewicz, phil.blunsom}@cs.ox.ac.uk t.rocktaschel@ucl.ac.uk 1Department of Computer Science, University of Oxford 2Department of Computer Science, University College London 3Alan Turing Institute, London, UK 4Deep Mind, London, UK
Pseudocode	No	The paper describes model architectures and training procedures in text but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	1https://github.com/Oana Maria Camburu/e-SNLI
Open Datasets	Yes	We extend the Stanford Natural Language Inference dataset with an additional layer of human-annotated natural language explanations of the entailment relations. We call our explanation-augmented dataset e-SNLI, which we collected to enable research in the direction of training with and generation of free-form textual justiﬁcations. 1https://github.com/Oana Maria Camburu/e-SNLI
Dataset Splits	Yes	We collected one explanation for each pair in the training set and three explanations for each pair in the validation and test sets. accuracy on the SNLI validation set and perplexity on the validation set of SNLI.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, or cloud computing specifications.
Software Dependencies	No	Throughout our experiments, our models follow the architecture presented in Conneau et al. [7], as we build directly on top of their code2. Therefore, our encoders are 2048-bidirectional-LSTMs [15] with max-pooling, resulting in a sentence representation dimension of 4096. Our label classiﬁers are 3-layers MLPs with 512 internal size and without non-linearities. For our explanation decoders, we used a simple one-layer LSTM, for which we tried internal sizes of 512, 1024, 2048, and 4096. In order to reduce the vocabulary size for explanation generation, we replaced words that appeared less than 15 times3 with <UNK>. The preprocessing and optimization were kept the same as in [7].
Experiment Setup	Yes	Throughout our experiments, our models follow the architecture presented in Conneau et al. [7], as we build directly on top of their code2. Therefore, our encoders are 2048-bidirectional-LSTMs [15] with max-pooling, resulting in a sentence representation dimension of 4096. Our label classiﬁers are 3-layers MLPs with 512 internal size and without non-linearities. For our explanation decoders, we used a simple one-layer LSTM, for which we tried internal sizes of 512, 1024, 2048, and 4096. We use negative log-likelihood for both classiﬁcation and explanation losses. To account for this difference during training, we use a weighting coefﬁcient α [0, 1]. Hence, our overall loss is: Ltotal = αLlabel + (1 α)Lexplanation (1). We consider α values from 0.1 to 0.9 with a step of 0.1 and decoder internal sizes of 512, 1024, 2048, and 4096. Whenever appropriate, we run our models with ﬁve seeds and provide the average performance with the standard deviation in parenthesis.