Knowledge Base Completion Using Embeddings and Rules

Authors: Quan Wang, Bin Wang, Li Guo

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on two publicly available data sets show that our approach significantly and consistently outperforms state-of-the-art embedding models in KB completion.
Researcher Affiliation Academia Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Pseudocode No The paper presents the Integer Linear Programming (ILP) formulation in Figure 2, but it does not contain pseudocode or an algorithm block.
Open Source Code No We implement RESCAL and TRESCAL in Java, and use the code released by Bordes et al. [2013] for Trans E5. https://github.com/glorotxa/SME
Open Datasets Yes We create two data sets Location and Sport using NELL, both containing five relations (listed in Table 1) and the associated triples.
Dataset Splits Yes To evaluate, for each data set, we split the triples into a training set and a test set, with the ratio of 4:1.
Hardware Specification No The paper mentions computation times ('It takes about 1 minute on Location data and 2 hours on Sport data') but does not provide specific details about the hardware used for these computations (e.g., GPU/CPU models, memory).
Software Dependencies Yes We implement RESCAL and TRESCAL in Java, and use the code released by Bordes et al. [2013] for Trans E5. We use the lp solve package6 to solve the ILP problems. (footnote 6: http://lpsolve.sourceforge.net/5.5/)
Experiment Setup Yes In RESCAL and TRESCAL, we fix the regularization parameter λ to 0.1, and the maximal number of iterations to 10... In Trans E, we fix the margin to 1, the learning rate to 10, the batch number to 5, and the maximal number of iterations again to 10... For each of the three models, we tune the the latent dimension d in the range of {10, 20, 30, 40, 50} and select the optimal parameter setting.