Rule Induction in Knowledge Graphs Using Linear Programming
Authors: Sanjeeb Dash, Joao Goncalves
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct KGC experiments with 5 datasets: Kinship (Denham 1973), UMLS (Mc Cray 2003), FB15k-237 (Toutanova and Chen 2015), WN18RR (Dettmers et al. 2018), and YAGO3-10 (Mahdisoltani, Biega, and Suchanek 2015). The partition of FB15k-237, WN18RR, and YAGO3-10 into training, testing, and validation data sets is standard. We use the partition for UMLS and Kinship in Dettmers et al. (2018). In Table 1, we give the number of entities and relations in each dataset, and facts in each partition. |
| Researcher Affiliation | Industry | Sanjeeb Dash, Jo ao Gonc alves IBM Research, Yorktown Heights, New York, USA {sanjeebd, jpgoncal}@us.ibm.com |
| Pseudocode | Yes | Algorithm 1: LPRules |
| Open Source Code | Yes | We ran two variants of our code which we call LPRules 1 (see Algorithm 1). 1https://github.com/IBM/LPRules |
| Open Datasets | Yes | We conduct KGC experiments with 5 datasets: Kinship (Denham 1973), UMLS (Mc Cray 2003), FB15k-237 (Toutanova and Chen 2015), WN18RR (Dettmers et al. 2018), and YAGO3-10 (Mahdisoltani, Biega, and Suchanek 2015). |
| Dataset Splits | Yes | A collection of facts F is divided into a training set Ftr, a validation set Fv, and a test set Fte, the KG G corresponding to Ftr is constructed and a scoring function is learnt from G and evaluated on the test set. [...] We use the validation data set to select those τ and κ that yield the best MRR. |
| Hardware Specification | Yes | We run the rule-based codes on a 60 core machine with 128 GBytes of RAM, and four 2.8 Intel Xeon E7-4890 v2 processors, each with 15 cores. |
| Software Dependencies | Yes | In our code, we execute rule generation for each relation on a different thread, and solve LPs with CPLEX (IBM 2019). The reference (IBM 2019) specifies "IBM ILOG CPLEX Optimization Studio 12.10.0." |
| Experiment Setup | Yes | We search for the best τ (from an input list) and κ for each relation. We dynamically let κ equal the length of the longest rule generated plus one. We then perform 20 iterations where, at the ith iteration, we set κ to i κ. We use the validation data set to select those τ and κ that yield the best MRR. We set the maximum rule length to 6 for WN18RR, and 3 for YAGO3-10, and 4 for the other datasets. Thus κ 100 except for WN18RR. |