A Foundation Model for Zero-shot Logical Query Reasoning
Authors: Michael Galkin, Jincheng Zhou, Bruno Ribeiro, Jian Tang, Zhaocheng Zhu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimenting on 23 datasets, ULTRAQUERY in the zero-shot inference mode shows competitive or better query answering performance than best available baselines and sets a new state of the art on 15 of them. |
| Researcher Affiliation | Collaboration | Mikhail Galkin1, Jincheng Zhou2, , Bruno Ribeiro2, Jian Tang3,4,5, Zhaocheng Zhu3,6 1Intel AI Lab, 2Purdue University, 3Mila Qu ebec AI Institute, 4HEC Montr eal, 5CIFAR AI Chair 6Universit e de Montr eal |
| Pseudocode | No | The paper describes the model architecture and training process in text and with diagrams, but does not provide structured pseudocode or an algorithm block. |
| Open Source Code | Yes | Code: https://github.com/Deep Graph Learning/ULTRA |
| Open Datasets | Yes | We employ 23 different CLQA datasets each with 14 standard query types and its own underlying KG with different sets of entities and relations. Following Section 3, we categorize the datasets into three groups (more statistics of the datasets and queries are provided in Appendix A): Transductive (3 datasets) [...] FB15k237, NELL995 and FB15k all from Ren and Leskovec [25] [...]; Inductive entity (e) (9 datasets) from Galkin et al. [14] [...]; Inductive entity and relation (e, r) (11 datasets): we sampled a novel suite of Wiki Topics-QA datasets due to the absence of standard benchmarks evaluating the hardest inductive setup where inference graphs have both new entities and relations (Gtrain = Ginf). More details on the dataset creation procedure are in Appendix A. |
| Dataset Splits | Yes | As common in the literature [25, 27], the answer set of each query is split into easy and hard answers. Easy answers are reachable by graph traversal and do not require inferring missing links whereas hard answers are those that involve at least one edge to be predicted at inference. In the rank-based evaluation, we only consider ranks of hard answers and filter out easy ones and report filtered Mean Reciprocal Rank (MRR) and Hits@10 as main performance metrics. |
| Hardware Specification | Yes | ULTRAQUERY was trained on one FB15k237 dataset with complex queries for 10,000 steps with batch size of 32 on 4 RTX 3090 GPUs for 2 hours (8 GPU-hours in total). |
| Software Dependencies | No | Both ULTRAQUERY and ULTRAQUERY LP are implemented with PyTorch [24] (BSD-style license) and PyTorch-Geometric [12] (MIT license). |
| Experiment Setup | Yes | ULTRAQUERY was trained on one FB15k237 dataset with complex queries for 10,000 steps with batch size of 32 on 4 RTX 3090 GPUs for 2 hours (8 GPU-hours in total). We initialize the model weights with an available checkpoint of ULTRA reported in Galkin et al. [15]. Following the standard setup in the literature, we train the model on 10 query types and evaluate on all 14 patterns. We employ product t-norm and t-conorm as non-parametric fuzzy logic operators to implement conjunction ( ) and disjunction ( ), respectively, and use a simple 1 x negation. |