Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Unsupervised Embedding Enhancements of Knowledge Graphs using Textual Associations

Authors: Neil Veira, Brian Keng, Kanchana Padmanabhan, Andreas Veneris

IJCAI 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that both methods can offer improvement on the link prediction task when applied to many different knowledge graph embedding models. As such, we empirically show performance improvement on knowledge completion tasks with six different models. In this section we evaluate the proposed embedding enhancement methods on standard subsets of Freebase [Bollacker et al., 2008] and Wordnet [Miller, 1995].
Researcher Affiliation	Collaboration	1Department of Electrical and Computer Engineering, University of Toronto 2Data Science, Rubikloud Technologies Inc.
Pseudocode	No	The paper describes the proposed models and procedures mathematically and in natural language, but does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	For reproducibility all code and data has been made publicly available1. 1github.com/rubikloud/kg-text-embeddings
Open Datasets	Yes	We use a subset of Wordnet introduced by [Bordes et al., 2013] named WN18, consisting of 18 relations, 40,943 entities, and 156,442 facts. In our experiments we use FB15k, a subset of Freebase by [Bordes et al., 2013] consisting of 1,345 relations, 14,951 entities, and 592,213 facts. As an unstructured text corpus we use the Google News data set and word2vec vectors pretrained on it2. 2code.google.com/archive/p/word2vec. For reproducibility all code and data has been made publicly available1. Table 4: Dataset characteristics, listing # Train triplets.
Dataset Splits	Yes	This table also gives the number of words in the vocabularies and the breakdown of triplets into training, validation, and test sets. (referring to Table 4) Table 4: Dataset characteristics, listing # Valid triplets. The learning rate was selected based on validation performance...
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies	No	The paper mentions algorithms like 'Ada Grad algorithm' and the 'word2vec model' but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Training in all experiments is performed using 200 epochs and a batch size of 1024. The learning rate was selected based on validation performance, resulting in a learning rate of 0.01 for the baseline and Feature Sum experiments and 0.1 for WV, WWV, and PE-WWV. We use an embedding dimensionality of d = 100 and a margin of γ = 1.0 for the ranking loss.