Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Knowledge Base Completion Using Embeddings and Rules

Authors: Quan Wang, Bin Wang, Li Guo

IJCAI 2015 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on two publicly available data sets show that our approach significantly and consistently outperforms state-of-the-art embedding models in KB completion.
Researcher Affiliation Academia Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Pseudocode No The paper presents the Integer Linear Programming (ILP) formulation in Figure 2, but it does not contain pseudocode or an algorithm block.
Open Source Code No We implement RESCAL and TRESCAL in Java, and use the code released by Bordes et al. [2013] for Trans E5. https://github.com/glorotxa/SME
Open Datasets Yes We create two data sets Location and Sport using NELL, both containing five relations (listed in Table 1) and the associated triples.
Dataset Splits Yes To evaluate, for each data set, we split the triples into a training set and a test set, with the ratio of 4:1.
Hardware Specification No The paper mentions computation times ('It takes about 1 minute on Location data and 2 hours on Sport data') but does not provide specific details about the hardware used for these computations (e.g., GPU/CPU models, memory).
Software Dependencies Yes We implement RESCAL and TRESCAL in Java, and use the code released by Bordes et al. [2013] for Trans E5. We use the lp solve package6 to solve the ILP problems. (footnote 6: http://lpsolve.sourceforge.net/5.5/)
Experiment Setup Yes In RESCAL and TRESCAL, we fix the regularization parameter λ to 0.1, and the maximal number of iterations to 10... In Trans E, we fix the margin to 1, the learning rate to 10, the batch number to 5, and the maximal number of iterations again to 10... For each of the three models, we tune the the latent dimension d in the range of {10, 20, 30, 40, 50} and select the optimal parameter setting.