Analogical Inference for Multi-relational Embeddings
Authors: Hanxiao Liu, Yuexin Wu, Yiming Yang
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. ExperimentsWe evaluate ANALOGY and the baselines over two benchmark datasets for multi-relational embedding released by previous work (Bordes et al., 2013), namely a subset of Freebase (FB15K) for generic facts and Word Net (WN18) for lexical relationships between words.Table 2. Hits@10 (filt.) of all models on WN18 and FB15KIn both tables, ANALOGY performs either the best or the 2nd best which is in the equivalent class with the best score in each case according statistical significance test. |
| Researcher Affiliation | Academia | 1Carnegie Mellon University, Pittsburgh, PA 15213, USA. Correspondence to: Hanxiao Liu <hanxiaol@cs.cmu.edu>. |
| Pseudocode | No | The paper describes algorithmic steps and mathematical formulations but does not present a formal pseudocode block or algorithm listing. |
| Open Source Code | Yes | Our C++ implementation2 runs over a CPU, as ANALOGY only requires lightweight linear algebra routines. We use asynchronous stochastic gradient descent (SGD) for optimization, where the gradients with respect to different mini-batches are simultaneously evaluated in multiple threads, and the gradient updates for the shared model parameters are carried out without synchronization. Asynchronous SGD is highly efficient, and causes little performance drop when parameters associated with different mini-batches are mutually disjoint with a high probability (Recht et al., 2011). We adapt the learning rate based on historical gradients using Ada Grad (Duchi et al., 2011). 2Code available at https://github.com/quark0/ANALOGY. |
| Open Datasets | Yes | We evaluate ANALOGY and the baselines over two benchmark datasets for multi-relational embedding released by previous work (Bordes et al., 2013), namely a subset of Freebase (FB15K) for generic facts and Word Net (WN18) for lexical relationships between words. |
| Dataset Splits | Yes | The dataset statistics are summarized in Table 1. Table 1. Dataset |E| |R| #train #valid #test FB15K 14,951 1,345 483,142 50,000 59,071 WN18 40,943 18 141,442 5,000 5,000 |
| Hardware Specification | No | Our C++ implementation2 runs over a CPU, as ANALOGY only requires lightweight linear algebra routines. ... The paper states it runs on a CPU but does not provide specific details such as CPU model, number of cores, or other hardware specifications. |
| Software Dependencies | No | Our C++ implementation2 runs over a CPU... We adapt the learning rate based on historical gradients using Ada Grad (Duchi et al., 2011). The paper mentions C++ and AdaGrad but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | We conducted a grid search to find the hyperparameters of ANALOGY which maximize the filtered MRR on the validation set, by enumerating all combinations of the embedding size m {100, 150, 200}, ℓ2 weight decay factor λ {10 1, 10 2, 10 3} of model coefficients v and W, and the ratio of negative over positive samples α {3, 6}. The resulting hyperparameters for the WN18 dataset are m = 200, λ = 10 2, α = 3, and those for the FB15K dataset are m = 200, λ = 10 3, α = 6. The number of scalars on the diagonal of each Br is always set to be m / 2. We set the initial learning rate to be 0.1 for both datasets and adjust it using Ada Grad during optimization. All models are trained for 500 epochs. |