Rule Induction in Knowledge Graphs Using Linear Programming

Authors: Sanjeeb Dash, Joao Goncalves

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct KGC experiments with 5 datasets: Kinship (Denham 1973), UMLS (Mc Cray 2003), FB15k-237 (Toutanova and Chen 2015), WN18RR (Dettmers et al. 2018), and YAGO3-10 (Mahdisoltani, Biega, and Suchanek 2015). The partition of FB15k-237, WN18RR, and YAGO3-10 into training, testing, and validation data sets is standard. We use the partition for UMLS and Kinship in Dettmers et al. (2018). In Table 1, we give the number of entities and relations in each dataset, and facts in each partition.
Researcher Affiliation Industry Sanjeeb Dash, Jo ao Gonc alves IBM Research, Yorktown Heights, New York, USA {sanjeebd, jpgoncal}@us.ibm.com
Pseudocode Yes Algorithm 1: LPRules
Open Source Code Yes We ran two variants of our code which we call LPRules 1 (see Algorithm 1). 1https://github.com/IBM/LPRules
Open Datasets Yes We conduct KGC experiments with 5 datasets: Kinship (Denham 1973), UMLS (Mc Cray 2003), FB15k-237 (Toutanova and Chen 2015), WN18RR (Dettmers et al. 2018), and YAGO3-10 (Mahdisoltani, Biega, and Suchanek 2015).
Dataset Splits Yes A collection of facts F is divided into a training set Ftr, a validation set Fv, and a test set Fte, the KG G corresponding to Ftr is constructed and a scoring function is learnt from G and evaluated on the test set. [...] We use the validation data set to select those τ and κ that yield the best MRR.
Hardware Specification Yes We run the rule-based codes on a 60 core machine with 128 GBytes of RAM, and four 2.8 Intel Xeon E7-4890 v2 processors, each with 15 cores.
Software Dependencies Yes In our code, we execute rule generation for each relation on a different thread, and solve LPs with CPLEX (IBM 2019). The reference (IBM 2019) specifies "IBM ILOG CPLEX Optimization Studio 12.10.0."
Experiment Setup Yes We search for the best τ (from an input list) and κ for each relation. We dynamically let κ equal the length of the longest rule generated plus one. We then perform 20 iterations where, at the ith iteration, we set κ to i κ. We use the validation data set to select those τ and κ that yield the best MRR. We set the maximum rule length to 6 for WN18RR, and 3 for YAGO3-10, and 4 for the other datasets. Thus κ 100 except for WN18RR.