Unsupervised Embedding Enhancements of Knowledge Graphs using Textual Associations

Authors: Neil Veira, Brian Keng, Kanchana Padmanabhan, Andreas Veneris

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that both methods can offer improvement on the link prediction task when applied to many different knowledge graph embedding models. As such, we empirically show performance improvement on knowledge completion tasks with six different models. In this section we evaluate the proposed embedding enhancement methods on standard subsets of Freebase [Bollacker et al., 2008] and Wordnet [Miller, 1995].
Researcher Affiliation Collaboration 1Department of Electrical and Computer Engineering, University of Toronto 2Data Science, Rubikloud Technologies Inc.
Pseudocode No The paper describes the proposed models and procedures mathematically and in natural language, but does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes For reproducibility all code and data has been made publicly available1. 1github.com/rubikloud/kg-text-embeddings
Open Datasets Yes We use a subset of Wordnet introduced by [Bordes et al., 2013] named WN18, consisting of 18 relations, 40,943 entities, and 156,442 facts. In our experiments we use FB15k, a subset of Freebase by [Bordes et al., 2013] consisting of 1,345 relations, 14,951 entities, and 592,213 facts. As an unstructured text corpus we use the Google News data set and word2vec vectors pretrained on it2. 2code.google.com/archive/p/word2vec. For reproducibility all code and data has been made publicly available1. Table 4: Dataset characteristics, listing # Train triplets.
Dataset Splits Yes This table also gives the number of words in the vocabularies and the breakdown of triplets into training, validation, and test sets. (referring to Table 4) Table 4: Dataset characteristics, listing # Valid triplets. The learning rate was selected based on validation performance...
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies No The paper mentions algorithms like 'Ada Grad algorithm' and the 'word2vec model' but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Training in all experiments is performed using 200 epochs and a batch size of 1024. The learning rate was selected based on validation performance, resulting in a learning rate of 0.01 for the baseline and Feature Sum experiments and 0.1 for WV, WWV, and PE-WWV. We use an embedding dimensionality of d = 100 and a margin of γ = 1.0 for the ranking loss.