reproducibilityindex.ai

A Foundation Model for Zero-shot Logical Query Reasoning

Authors: Michael Galkin, Jincheng Zhou, Bruno Ribeiro, Jian Tang, Zhaocheng Zhu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimenting on 23 datasets, ULTRAQUERY in the zero-shot inference mode shows competitive or better query answering performance than best available baselines and sets a new state of the art on 15 of them.
Researcher Affiliation	Collaboration	Mikhail Galkin1, Jincheng Zhou2, , Bruno Ribeiro2, Jian Tang3,4,5, Zhaocheng Zhu3,6 1Intel AI Lab, 2Purdue University, 3Mila Qu ebec AI Institute, 4HEC Montr eal, 5CIFAR AI Chair 6Universit e de Montr eal
Pseudocode	No	The paper describes the model architecture and training process in text and with diagrams, but does not provide structured pseudocode or an algorithm block.
Open Source Code	Yes	Code: https://github.com/Deep Graph Learning/ULTRA
Open Datasets	Yes	We employ 23 different CLQA datasets each with 14 standard query types and its own underlying KG with different sets of entities and relations. Following Section 3, we categorize the datasets into three groups (more statistics of the datasets and queries are provided in Appendix A): Transductive (3 datasets) [...] FB15k237, NELL995 and FB15k all from Ren and Leskovec [25] [...]; Inductive entity (e) (9 datasets) from Galkin et al. [14] [...]; Inductive entity and relation (e, r) (11 datasets): we sampled a novel suite of Wiki Topics-QA datasets due to the absence of standard benchmarks evaluating the hardest inductive setup where inference graphs have both new entities and relations (Gtrain = Ginf). More details on the dataset creation procedure are in Appendix A.
Dataset Splits	Yes	As common in the literature [25, 27], the answer set of each query is split into easy and hard answers. Easy answers are reachable by graph traversal and do not require inferring missing links whereas hard answers are those that involve at least one edge to be predicted at inference. In the rank-based evaluation, we only consider ranks of hard answers and filter out easy ones and report filtered Mean Reciprocal Rank (MRR) and Hits@10 as main performance metrics.
Hardware Specification	Yes	ULTRAQUERY was trained on one FB15k237 dataset with complex queries for 10,000 steps with batch size of 32 on 4 RTX 3090 GPUs for 2 hours (8 GPU-hours in total).
Software Dependencies	No	Both ULTRAQUERY and ULTRAQUERY LP are implemented with PyTorch [24] (BSD-style license) and PyTorch-Geometric [12] (MIT license).
Experiment Setup	Yes	ULTRAQUERY was trained on one FB15k237 dataset with complex queries for 10,000 steps with batch size of 32 on 4 RTX 3090 GPUs for 2 hours (8 GPU-hours in total). We initialize the model weights with an available checkpoint of ULTRA reported in Galkin et al. [15]. Following the standard setup in the literature, we train the model on 10 query types and evaluate on all 14 patterns. We employ product t-norm and t-conorm as non-parametric fuzzy logic operators to implement conjunction ( ) and disjunction ( ), respectively, and use a simple 1 x negation.