e-SNLI: Natural Language Inference with Natural Language Explanations

Authors: Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, Phil Blunsom

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In order to demonstrate the efficacy of the e-SNLI dataset, we first show that it is much more difficult to produce correct explanations based on spurious correlations than to produce correct labels. We then implement models that, given a premise and a hypothesis, predict a label and an explanation. We also investigate how the additional signal from explanations received at train time can guide models into learning better sentence representations. Finally, we look into the transfer capabilities of our model to out-of-domain NLI datasets. Throughout our experiments, our models follow the architecture presented in Conneau et al. [7], as we build directly on top of their code2.
Researcher Affiliation Collaboration Oana-Maria Camburu1 Tim Rocktäschel2 Thomas Lukasiewicz1,3 Phil Blunsom1,4 {oana-maria.camburu, thomas.lukasiewicz, phil.blunsom}@cs.ox.ac.uk t.rocktaschel@ucl.ac.uk 1Department of Computer Science, University of Oxford 2Department of Computer Science, University College London 3Alan Turing Institute, London, UK 4Deep Mind, London, UK
Pseudocode No The paper describes model architectures and training procedures in text but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes 1https://github.com/Oana Maria Camburu/e-SNLI
Open Datasets Yes We extend the Stanford Natural Language Inference dataset with an additional layer of human-annotated natural language explanations of the entailment relations. We call our explanation-augmented dataset e-SNLI, which we collected to enable research in the direction of training with and generation of free-form textual justifications. 1https://github.com/Oana Maria Camburu/e-SNLI
Dataset Splits Yes We collected one explanation for each pair in the training set and three explanations for each pair in the validation and test sets. accuracy on the SNLI validation set and perplexity on the validation set of SNLI.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, or cloud computing specifications.
Software Dependencies No Throughout our experiments, our models follow the architecture presented in Conneau et al. [7], as we build directly on top of their code2. Therefore, our encoders are 2048-bidirectional-LSTMs [15] with max-pooling, resulting in a sentence representation dimension of 4096. Our label classifiers are 3-layers MLPs with 512 internal size and without non-linearities. For our explanation decoders, we used a simple one-layer LSTM, for which we tried internal sizes of 512, 1024, 2048, and 4096. In order to reduce the vocabulary size for explanation generation, we replaced words that appeared less than 15 times3 with <UNK>. The preprocessing and optimization were kept the same as in [7].
Experiment Setup Yes Throughout our experiments, our models follow the architecture presented in Conneau et al. [7], as we build directly on top of their code2. Therefore, our encoders are 2048-bidirectional-LSTMs [15] with max-pooling, resulting in a sentence representation dimension of 4096. Our label classifiers are 3-layers MLPs with 512 internal size and without non-linearities. For our explanation decoders, we used a simple one-layer LSTM, for which we tried internal sizes of 512, 1024, 2048, and 4096. We use negative log-likelihood for both classification and explanation losses. To account for this difference during training, we use a weighting coefficient α [0, 1]. Hence, our overall loss is: Ltotal = αLlabel + (1 α)Lexplanation (1). We consider α values from 0.1 to 0.9 with a step of 0.1 and decoder internal sizes of 512, 1024, 2048, and 4096. Whenever appropriate, we run our models with five seeds and provide the average performance with the standard deviation in parenthesis.