Unsupervised Embedding Enhancements of Knowledge Graphs using Textual Associations
Authors: Neil Veira, Brian Keng, Kanchana Padmanabhan, Andreas Veneris
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that both methods can offer improvement on the link prediction task when applied to many different knowledge graph embedding models. As such, we empirically show performance improvement on knowledge completion tasks with six different models. In this section we evaluate the proposed embedding enhancement methods on standard subsets of Freebase [Bollacker et al., 2008] and Wordnet [Miller, 1995]. |
| Researcher Affiliation | Collaboration | 1Department of Electrical and Computer Engineering, University of Toronto 2Data Science, Rubikloud Technologies Inc. |
| Pseudocode | No | The paper describes the proposed models and procedures mathematically and in natural language, but does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | For reproducibility all code and data has been made publicly available1. 1github.com/rubikloud/kg-text-embeddings |
| Open Datasets | Yes | We use a subset of Wordnet introduced by [Bordes et al., 2013] named WN18, consisting of 18 relations, 40,943 entities, and 156,442 facts. In our experiments we use FB15k, a subset of Freebase by [Bordes et al., 2013] consisting of 1,345 relations, 14,951 entities, and 592,213 facts. As an unstructured text corpus we use the Google News data set and word2vec vectors pretrained on it2. 2code.google.com/archive/p/word2vec. For reproducibility all code and data has been made publicly available1. Table 4: Dataset characteristics, listing # Train triplets. |
| Dataset Splits | Yes | This table also gives the number of words in the vocabularies and the breakdown of triplets into training, validation, and test sets. (referring to Table 4) Table 4: Dataset characteristics, listing # Valid triplets. The learning rate was selected based on validation performance... |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications). |
| Software Dependencies | No | The paper mentions algorithms like 'Ada Grad algorithm' and the 'word2vec model' but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Training in all experiments is performed using 200 epochs and a batch size of 1024. The learning rate was selected based on validation performance, resulting in a learning rate of 0.01 for the baseline and Feature Sum experiments and 0.1 for WV, WWV, and PE-WWV. We use an embedding dimensionality of d = 100 and a margin of γ = 1.0 for the ranking loss. |