reproducibilityindex.ai

Reasoning about Entailment with Neural Attention

Authors: Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Phil Blunsom

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On a large entailment dataset this model outperforms the previous best neural model and a classiﬁer with engineered features by a substantial margin. Our benchmark LSTM achieves an accuracy of 80.9% on SNLI, outperforming a simple lexicalized classiﬁer tailored to RTE by 2.7 percentage points. An extension with word-by-word neural attention surpasses this strong benchmark LSTM result by 2.6 percentage points, setting a new state-of-the-art accuracy of 83.5% for recognizing entailment on SNLI.
Researcher Affiliation	Collaboration	Tim Rockt aschel University College London t.rocktaschel@cs.ucl.ac.uk Edward Grefenstette & Karl Moritz Hermann Google Deep Mind {etg,kmh}@google.com Tom aˇs Koˇcisk y & Phil Blunsom Google Deep Mind & University of Oxford {tkocisky,pblunsom}@google.com
Pseudocode	No	The paper provides mathematical equations for the LSTM and attention mechanisms but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets	Yes	We conduct experiments on the Stanford Natural Language Inference corpus (SNLI, Bowman et al., 2015).
Dataset Splits	Yes	Subsequently, we take the best conﬁguration based on performance on the validation set, and evaluate only that conﬁguration on the test set. Table 1: Results on the SNLI corpus. Train Dev Test
Hardware Specification	No	The paper does not provide specific details on the hardware used for experiments (e.g., GPU/CPU models, memory).
Software Dependencies	No	We use ADAM (Kingma and Ba, 2015) for optimization with a ﬁrst momentum coefﬁcient of 0.9 and a second momentum coefﬁcient of 0.999. (Does not specify software versions, only optimizer name)
Experiment Setup	Yes	For every model we perform a small grid search over combinations of the initial learning rate [1E-4, 3E-4, 1E-3], dropout3 [0.0, 0.1, 0.2] and ℓ2regularization strength [0.0, 1E-4, 3E-4, 1E-3].