NELLIE: A Neuro-Symbolic Inference Engine for Grounded, Compositional, and Explainable Reasoning
Authors: Nathaniel Weir, Peter Clark, Benjamin Van Durme
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, NELLIE outperforms a similar-sized stateof-the-art reasoner while producing knowledgegrounded explanations. We also find NELLIE can exploit both semi-structured and NL text corpora to guide reasoning. |
| Researcher Affiliation | Collaboration | Nathaniel Weir1, Peter Clark2, and Benjamin Van Durme1 1Johns Hopkins University, Baltimore, MD, USA 2Allen Institute for AI, Seattle, WA, USA |
| Pseudocode | Yes | Full pseudocode for the algorithm, which follows a depth-first search with a breadth-first lookahead [Stern et al., 2010] to check for the unification of generated subgoals, can be found in E. |
| Open Source Code | Yes | 1Code and appendix at https://github.com/JHU-CLSP/NELLIE. |
| Open Datasets | Yes | In our experiments, we consider one implementation of this framework that uses the corpus World Tree [Xie et al., 2020], a set of 9K NL science facts... We evaluate models on two multiple-choice QA datasets constructed so that correct answers are supported by facts in the World Tree corpus: Entailment Bank [Dalvi et al., 2021]... World Tree [Xie et al., 2020]... We consider Open Book QA [Mihaylov et al., 2018]... |
| Dataset Splits | No | The paper mentions using specific datasets (Entailment Bank, World Tree QA, Open Book QA) and refers to a 'test set' for evaluation. However, it does not explicitly provide the training, validation, and test dataset splits with percentages or sample counts for its own experiments in the main text. |
| Hardware Specification | No | The paper mentions models like 'T5-3B model' but does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using a 'T5-3B model' and a 'FAISS [Johnson et al., 2019]-based nearest neighbor dense retrieval index', as well as Prolog. However, no specific version numbers are provided for these software components or any other libraries or frameworks. |
| Experiment Setup | Yes | It is parameterized by 1. A maximum number of proofs m at which to cut off searching. In experiments, we set this to 10 for top-level queries and 2 for recursive subqueries. 2. A number of support facts nf to retrieve at each call to RETRIEVEK, which we set to 15. 3. Candidate generation rates nv for vanilla nucleussampled decompositions, nt for template-conditioned decompositions, and nr for retrieval-conditioned generations. We set these each to 40. ... NELLIE searches for up to p=10 proofs of max depth d=5 with a timeout of t=180 seconds per option. |