reproducibilityindex.ai

Embedding Logical Queries on Knowledge Graphs

Authors: Will Hamilton, Payal Bajaj, Marinka Zitnik, Dan Jurafsky, Jure Leskovec

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the utility of this framework in two application studies on real-world datasets with millions of relations: predicting logical relationships in a network of drug-gene-disease interactions and in a graph-based representation of social interactions derived from a popular web forum.
Researcher Affiliation	Academia	William L. Hamilton Payal Bajaj Marinka Zitnik Dan Jurafsky Jure Leskovec {wleif, pbajaj, jurafsky}@stanford.edu, {jure, marinka}@cs.stanford.edu Stanford University, Department of Computer Science, Department of Linguistics
Pseudocode	Yes	The core of our framework is Algorithm 1, which maps any conjunctive input query q to an embedding q Rd using two differentiable operators, P and I, described below.
Open Source Code	Yes	Code and data is available at https://github.com/williamleif/graphqembed.
Open Datasets	Yes	Code and data is available at https://github.com/williamleif/graphqembed. We run experiments on the biological interaction (Bio) and Reddit datasets (Figure 2). ... Example 1: Drug interactions (Figure 2.a). A knowledge graph derived from a number from public biomedical databases (Appendix B). ... Example 2: Reddit dynamics (Figure 2.b). We also consider a graph-based representation of Reddit, one of the most popular websites in the world.
Dataset Splits	Yes	For training we sampled 106 queries with two edges and 106 queries with three edges, with equal numbers of samples for each different type of query DAG structure. For testing, we sampled 10,000 test queries for each DAG structure with two or three edges and ensured that these test queries involved missing edges (see above). We further sampled 1,000 test queries for each possible DAG structure to use for validation (e.g., for early stopping). We used all edges in the training graph as training examples for size-1 queries (i.e., edge prediction), and we used a 90/10 split of the deleted edges to form the test and validation sets for size-1 queries.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	For all baselines and variants, we used Py Torch [30], the Adam optimizer, an embedding dimension d = 128, a batch size of 256, and tested learning rates {0.1, 0.01, 0.001}. The paper mentions 'Py Torch' but does not specify a version number.
Experiment Setup	Yes	For all baselines and variants, we used Py Torch [30], the Adam optimizer, an embedding dimension d = 128, a batch size of 256, and tested learning rates {0.1, 0.01, 0.001}.