Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Neural Graph Reasoning: A Survey on Complex Logical Query Answering
Authors: Hongyu Ren, Mikhail Galkin, Zhaocheng Zhu, Jure Leskovec, Michael Cochez
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Multiple datasets have been proposed for evaluation of query reasoning models. Here we introduce the common setup for CLQA task. Given a knowledge graph G = (E, R, S), the standard practice is to split G into a training graph Gtrain, a validation graph Gval and a test graph Gtest (simulating the unobserved complete graph ˆG from Section 2). The standard experiment protocol is to train a query reasoning model only on the training graph Gtrain, and evaluate the model on answering queries over the validation graph Gval and the test graph Gtest. [...] Several metrics have been proposed to evaluate the performance of query reasoning models that can be broadly classified into generalization, entailment, and query representation quality metrics. |
| Researcher Affiliation | Collaboration | Hongyu Ren* 1 EMAIL Mikhail Galkin* 2 EMAIL Zhaocheng Zhu3 EMAIL Jure Leskovec1 EMAIL Michael Cochez4 EMAIL * Equal contribution 1 Stanford University 2Intel AI Lab 3Mila Québec AI Institute and Université de Montréal 4Vrije Universiteit Amsterdam and Elsevier discovery lab, Amsterdam, the Netherlands |
| Pseudocode | No | The paper is a survey that describes methods conceptually and uses figures to illustrate components (e.g., Figure 7: Neural Query Execution through the Encoder-Processor-Decoder modules) but does not contain any explicit pseudocode or algorithm blocks. It focuses on reviewing existing work rather than presenting a new algorithm with structured steps. |
| Open Source Code | No | The paper is a survey and does not present new methods or implementations by the authors. Therefore, it does not contain an explicit statement from the authors about releasing their source code, nor does it provide a link to a code repository for the methodology described. |
| Open Datasets | Yes | Multiple datasets have been proposed for evaluation of query reasoning models. ... Beta E datasets include sets of queries from denser Freebase (Bollacker et al., 2008) with average node degree of 18 and sparser Word Net (Miller, 1998) and NELL (Mitchell et al., 2015). Hyper-relational datasets WD50K (Alivanistos et al., 2022) and WD50KNFOL (Luo et al., 2023) were sampled from Wikidata (Vrandecic & Krötzsch, 2014)... |
| Dataset Splits | Yes | Given a knowledge graph G = (E, R, S), the standard practice is to split G into a training graph Gtrain, a validation graph Gval and a test graph Gtest (simulating the unobserved complete graph ˆG from Section 2). The standard experiment protocol is to train a query reasoning model only on the training graph Gtrain, and evaluate the model on answering queries over the validation graph Gval and the test graph Gtest. |
| Hardware Specification | No | As a survey paper, the document describes existing research and evaluation practices without conducting new experiments. Therefore, it does not provide specific details about the hardware used to run experiments, such as exact GPU or CPU models, memory, or cloud instance types. |
| Software Dependencies | No | As a survey paper, the document focuses on reviewing methodologies rather than implementing new ones. Consequently, it does not specify particular software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation. |
| Experiment Setup | No | As a survey paper, this document describes the experimental setups and evaluation methodologies found in the literature (e.g., Section 7.3 Training, which discusses training objectives of *other* methods). It does not present specific hyperparameters or training configurations for its own experiments, as it does not conduct new experimental research. |