Lorentzian Distance Learning for Hyperbolic Representations
Authors: Marc Law, Renjie Liao, Jake Snell, Richard Zemel
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach obtains state-of-the-art results in retrieval and classification tasks on different datasets. We evaluate the Lorentzian distance in three different tasks. |
| Researcher Affiliation | Collaboration | 1University of Toronto, Canada 2Vector Institute, Canada 3NVIDIA, work done while affiliated with the University of Toronto. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'We use the source code available at https://github. com/facebookresearch/poincare-embeddings' but this refers to the baseline's code, not the code for their own method. |
| Open Datasets | Yes | We consider the following datasets: (1) 2012 ACM Computing Classification System: (2) Euro Voc: (3) Medical Subject Headings (Me SH): (Rogers, 1963) is a medical thesaurus provided by the U.S. National Library of Medicine. (4) Wordnet: (Miller, 1998) is a large lexical database. ... on the CIFAR-100 (Krizhevsky & Hinton, 2009) dataset |
| Dataset Splits | No | For each subtree, they consider that every node that belongs to it is positive and all the other nodes of Wordnet nouns are negative. They then select 80% of the positive nodes for training, the rest for test. They select the same percentage of negative nodes for training and test. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as GPU or CPU models. |
| Software Dependencies | Yes | Following (Nickel & Kiela, 2017)2, we implemented our method in Pytorch 0.3.1. |
| Experiment Setup | Yes | We use the standard SGD optimizer with a learning rate of 0.1 and momentum of 0.9. For the largest datasets Wordnet Nouns and Me SH, we stop training after 1500 epochs. We stop training at 3000 epochs for the other datasets. The mini-batch size is 50, and the number of sampled negatives per example is 50. The weights of the embeddings are initialized from the continuous uniform distribution in the interval [ 10 4, 10 4]. The dimensionality of our embeddings is 10. |