Can Neural Networks Understand Logical Entailment?
Authors: Richard Evans, David Saxton, David Amos, Pushmeet Kohli, Edward Grefenstette
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce a new dataset of logical entailments for the purpose of measuring models ability to capture and exploit the structure of logical expressions against an entailment prediction task. We use this task to compare a series of architectures which are ubiquitous in the sequence-processing literature, in addition to a new model class Possible World Nets which computes entailment as a convolution over possible worlds . Results show that convolutional networks present the wrong inductive bias for this class of problems relative to LSTM RNNs, treestructured neural networks outperform LSTM RNNs due to their enhanced ability to exploit the syntax of logic, and Possible World Nets outperform all benchmarks. |
| Researcher Affiliation | Industry | Edward Grefenstette Deep Mind {richardevans,saxton,davidamos,pushmeet,etg}@google.com |
| Pseudocode | No | No pseudocode or algorithm block found. |
| Open Source Code | No | We aim to release the dataset used for experiments, and the code used to generate it according to the constraints discussed in this paper, upon publication of the paper. |
| Open Datasets | No | We aim to release the dataset used for experiments, and the code used to generate it according to the constraints discussed in this paper, upon publication of the paper. |
| Dataset Splits | Yes | We produced train, validation, and test (easy) by generating one large set of 4-tuples, and splitting them into groups of sizes 100000, 5000, and 5000. |
| Hardware Specification | No | No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) were found. |
| Software Dependencies | No | We implemented all architectures in Tensor Flow (Abadi et al., 2016). |
| Experiment Setup | Yes | We implemented all architectures in Tensor Flow (Abadi et al., 2016). We optimised all models with Adam (Kingma & Ba, 2014). We grid searched across learning rates in [1e 5, 1e 4, 1e 3], minibatch sizes in [64, 128], and trained each model thrice with different random seeds. Per architecture, we grid-searched across specific hyperparameters as follows. We searched across 2 and 3 layer MLPs wherever an MLP existed in a benchmark, and across layer sizes in [32, 64] for MLP hidden layers, embedding sizes, and RNN cell size (where applicable). Additionally for convolutional networks, we searched across a number of convolutional layers in [4, 6, 8], across kernel size in [5, 7, 9], across number of channels in [32, 64], and across pooling interval in [0, 5, 3, 1] (where 0 indicates no pooling). For the Transformer model, we searched across the number of encoder and decoder layers in the range [6, 8, 10], dropout probability in the range [0, 0.1, 0.5], and filter size in the range [128, 256, 384]. Finally, for all models, we ran them with and without the symbol permutation data augmentation technique described in Section 2.2. |