Embedding Logical Queries on Knowledge Graphs

Authors: Will Hamilton, Payal Bajaj, Marinka Zitnik, Dan Jurafsky, Jure Leskovec

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the utility of this framework in two application studies on real-world datasets with millions of relations: predicting logical relationships in a network of drug-gene-disease interactions and in a graph-based representation of social interactions derived from a popular web forum.
Researcher Affiliation Academia William L. Hamilton Payal Bajaj Marinka Zitnik Dan Jurafsky Jure Leskovec {wleif, pbajaj, jurafsky}@stanford.edu, {jure, marinka}@cs.stanford.edu Stanford University, Department of Computer Science, Department of Linguistics
Pseudocode Yes The core of our framework is Algorithm 1, which maps any conjunctive input query q to an embedding q Rd using two differentiable operators, P and I, described below.
Open Source Code Yes Code and data is available at https://github.com/williamleif/graphqembed.
Open Datasets Yes Code and data is available at https://github.com/williamleif/graphqembed. We run experiments on the biological interaction (Bio) and Reddit datasets (Figure 2). ... Example 1: Drug interactions (Figure 2.a). A knowledge graph derived from a number from public biomedical databases (Appendix B). ... Example 2: Reddit dynamics (Figure 2.b). We also consider a graph-based representation of Reddit, one of the most popular websites in the world.
Dataset Splits Yes For training we sampled 106 queries with two edges and 106 queries with three edges, with equal numbers of samples for each different type of query DAG structure. For testing, we sampled 10,000 test queries for each DAG structure with two or three edges and ensured that these test queries involved missing edges (see above). We further sampled 1,000 test queries for each possible DAG structure to use for validation (e.g., for early stopping). We used all edges in the training graph as training examples for size-1 queries (i.e., edge prediction), and we used a 90/10 split of the deleted edges to form the test and validation sets for size-1 queries.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No For all baselines and variants, we used Py Torch [30], the Adam optimizer, an embedding dimension d = 128, a batch size of 256, and tested learning rates {0.1, 0.01, 0.001}. The paper mentions 'Py Torch' but does not specify a version number.
Experiment Setup Yes For all baselines and variants, we used Py Torch [30], the Adam optimizer, an embedding dimension d = 128, a batch size of 256, and tested learning rates {0.1, 0.01, 0.001}.