Reasoning about Entailment with Neural Attention

Authors: Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Phil Blunsom

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On a large entailment dataset this model outperforms the previous best neural model and a classifier with engineered features by a substantial margin. Our benchmark LSTM achieves an accuracy of 80.9% on SNLI, outperforming a simple lexicalized classifier tailored to RTE by 2.7 percentage points. An extension with word-by-word neural attention surpasses this strong benchmark LSTM result by 2.6 percentage points, setting a new state-of-the-art accuracy of 83.5% for recognizing entailment on SNLI.
Researcher Affiliation Collaboration Tim Rockt aschel University College London t.rocktaschel@cs.ucl.ac.uk Edward Grefenstette & Karl Moritz Hermann Google Deep Mind {etg,kmh}@google.com Tom aˇs Koˇcisk y & Phil Blunsom Google Deep Mind & University of Oxford {tkocisky,pblunsom}@google.com
Pseudocode No The paper provides mathematical equations for the LSTM and attention mechanisms but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets Yes We conduct experiments on the Stanford Natural Language Inference corpus (SNLI, Bowman et al., 2015).
Dataset Splits Yes Subsequently, we take the best configuration based on performance on the validation set, and evaluate only that configuration on the test set. Table 1: Results on the SNLI corpus. Train Dev Test
Hardware Specification No The paper does not provide specific details on the hardware used for experiments (e.g., GPU/CPU models, memory).
Software Dependencies No We use ADAM (Kingma and Ba, 2015) for optimization with a first momentum coefficient of 0.9 and a second momentum coefficient of 0.999. (Does not specify software versions, only optimizer name)
Experiment Setup Yes For every model we perform a small grid search over combinations of the initial learning rate [1E-4, 3E-4, 1E-3], dropout3 [0.0, 0.1, 0.2] and ℓ2regularization strength [0.0, 1E-4, 3E-4, 1E-3].