Lorentzian Distance Learning for Hyperbolic Representations

Authors: Marc Law, Renjie Liao, Jake Snell, Richard Zemel

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach obtains state-of-the-art results in retrieval and classification tasks on different datasets. We evaluate the Lorentzian distance in three different tasks.
Researcher Affiliation Collaboration 1University of Toronto, Canada 2Vector Institute, Canada 3NVIDIA, work done while affiliated with the University of Toronto.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper states 'We use the source code available at https://github. com/facebookresearch/poincare-embeddings' but this refers to the baseline's code, not the code for their own method.
Open Datasets Yes We consider the following datasets: (1) 2012 ACM Computing Classification System: (2) Euro Voc: (3) Medical Subject Headings (Me SH): (Rogers, 1963) is a medical thesaurus provided by the U.S. National Library of Medicine. (4) Wordnet: (Miller, 1998) is a large lexical database. ... on the CIFAR-100 (Krizhevsky & Hinton, 2009) dataset
Dataset Splits No For each subtree, they consider that every node that belongs to it is positive and all the other nodes of Wordnet nouns are negative. They then select 80% of the positive nodes for training, the rest for test. They select the same percentage of negative nodes for training and test.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU or CPU models.
Software Dependencies Yes Following (Nickel & Kiela, 2017)2, we implemented our method in Pytorch 0.3.1.
Experiment Setup Yes We use the standard SGD optimizer with a learning rate of 0.1 and momentum of 0.9. For the largest datasets Wordnet Nouns and Me SH, we stop training after 1500 epochs. We stop training at 3000 epochs for the other datasets. The mini-batch size is 50, and the number of sampled negatives per example is 50. The weights of the embeddings are initialized from the continuous uniform distribution in the interval [ 10 4, 10 4]. The dimensionality of our embeddings is 10.