reproducibilityindex.ai

Can Neural Networks Understand Logical Entailment?

Authors: Richard Evans, David Saxton, David Amos, Pushmeet Kohli, Edward Grefenstette

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We introduce a new dataset of logical entailments for the purpose of measuring models ability to capture and exploit the structure of logical expressions against an entailment prediction task. We use this task to compare a series of architectures which are ubiquitous in the sequence-processing literature, in addition to a new model class Possible World Nets which computes entailment as a convolution over possible worlds . Results show that convolutional networks present the wrong inductive bias for this class of problems relative to LSTM RNNs, treestructured neural networks outperform LSTM RNNs due to their enhanced ability to exploit the syntax of logic, and Possible World Nets outperform all benchmarks.
Researcher Affiliation	Industry	Edward Grefenstette Deep Mind {richardevans,saxton,davidamos,pushmeet,etg}@google.com
Pseudocode	No	No pseudocode or algorithm block found.
Open Source Code	No	We aim to release the dataset used for experiments, and the code used to generate it according to the constraints discussed in this paper, upon publication of the paper.
Open Datasets	No	We aim to release the dataset used for experiments, and the code used to generate it according to the constraints discussed in this paper, upon publication of the paper.
Dataset Splits	Yes	We produced train, validation, and test (easy) by generating one large set of 4-tuples, and splitting them into groups of sizes 100000, 5000, and 5000.
Hardware Specification	No	No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) were found.
Software Dependencies	No	We implemented all architectures in Tensor Flow (Abadi et al., 2016).
Experiment Setup	Yes	We implemented all architectures in Tensor Flow (Abadi et al., 2016). We optimised all models with Adam (Kingma & Ba, 2014). We grid searched across learning rates in [1e 5, 1e 4, 1e 3], minibatch sizes in [64, 128], and trained each model thrice with different random seeds. Per architecture, we grid-searched across speciﬁc hyperparameters as follows. We searched across 2 and 3 layer MLPs wherever an MLP existed in a benchmark, and across layer sizes in [32, 64] for MLP hidden layers, embedding sizes, and RNN cell size (where applicable). Additionally for convolutional networks, we searched across a number of convolutional layers in [4, 6, 8], across kernel size in [5, 7, 9], across number of channels in [32, 64], and across pooling interval in [0, 5, 3, 1] (where 0 indicates no pooling). For the Transformer model, we searched across the number of encoder and decoder layers in the range [6, 8, 10], dropout probability in the range [0, 0.1, 0.5], and ﬁlter size in the range [128, 256, 384]. Finally, for all models, we ran them with and without the symbol permutation data augmentation technique described in Section 2.2.