reproducibilityindex.ai

Learning Reasoning Strategies in End-to-End Differentiable Proving

Authors: Pasquale Minervini, Sebastian Riedel, Pontus Stenetorp, Edward Grefenstette, Tim Rocktäschel

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate CTPs on two datasets: systematic generalisation on the CLUTRRs dataset, and link prediction in Knowledge Graphs (KGs). We show that CTPs are scalable and yield stateof-the-art results on the CLUTRR dataset, which tests systematic generalisation of neural models by learning to reason over smaller graphs and evaluating on larger ones. Finally, CTPs show better link prediction results on standard benchmarks in comparison with other neural-symbolic models, while being explainable.
Researcher Affiliation	Collaboration	1UCL Centre for Artiﬁcial Intelligence, University College London 2Facebook AI Research. Correspondence to: Pasquale Minervini <p.minervini@ucl.ac.uk>.
Pseudocode	Yes	Algorithm 1 Overview of the neural backward chaining algorithm proposed by Rocktäschel & Riedel (2017) intuitively, it recursively proves each goal with all rules in the KB (OR module) and, for each rule, it proves its premise (AND module), up to d recursion steps. Algorithm 2 In Conditional Theorem Provers, the set of rules is conditioned on the goal G.
Open Source Code	Yes	All source code and datasets are available online. 1At https://github.com/uclnlp/ctp
Open Datasets	Yes	Systematic Generalisation CLUTRR Compositional Language Understanding and Text-based Relational Reasoning (Sinha et al., 2019) contains a large set of graphs modelling hypothetical family relationships. Link Prediction Furthermore, we evaluate CTPs on neural link prediction tasks, following the same evaluation protocols as Rocktäschel & Riedel (2017) on the Countries (Bouchard et al., 2015), Nations, UMLS, and Kinship (Kemp et al., 2006) datasets.
Dataset Splits	Yes	For model selection, we generate a CLUTRR-like dataset using the code published by Sinha et al. (2019) composed of training set graphs with {2, 3} edges, and two validation sets, one with graphs with three edges, and another with graphs with nine edges. During training, a model is trained to infer such relationship by traversing a limited number of edges (such as two, three, and four edges), and during evaluation the model has to traverse up to ten edges.
Hardware Specification	No	The paper mentions 'NVIDIA for GPU donations' in the acknowledgements, but does not specify the models or quantities of GPUs, CPUs, or other hardware used for the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	For each of the baselines, we considered a wide range of hyperparameters: the dimensionalities of node and edge embeddings were varied in {10, 50, 100, 200, 500}, the number of attention heads in attention-based architectures in {1, 2, . . . , 10}, the number of ﬁlters in convolutional architectures in {1, 2, . . . , 10}, and the number of hidden units in recurrent architectures in {32, 64, 128, 256, 512}. All details on the hyperparameter selection process can be found in Appendix A.