A Wrapped Normal Distribution on Hyperbolic Space for Gradient-Based Learning

Authors: Yoshihiro Nagano, Shoichiro Yamaguchi, Yasuhiro Fujita, Masanori Koyama

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the efficacy of our distribution on various datasets including MNIST, Atari 2600 Breakout, and Word Net.
Researcher Affiliation Collaboration 1Department of Complexity Science and Engineering, The University of Tokyo, Japan 2Preferred Networks, Inc., Japan.
Pseudocode Yes Algorithm 1 is an algorithmic description of the sampling procedure based on our construction.
Open Source Code No The paper does not contain any statement about releasing source code or a link to a repository for the described methodology.
Open Datasets Yes We applied Hyperbolic VAE to a binarized version of MNIST. [...] We trained probabilistic word embedding models with Word Net nouns dataset (Miller, 1998)
Dataset Splits Yes We amassed a set of trajectories whose total length is 100,000, of which we used 80,000 as the training set, 10,000 as the validation set, and 10,000 as the test set.
Hardware Specification No The paper does not mention any specific hardware details such as GPU or CPU models, or cloud computing specifications used for running experiments.
Software Dependencies No The paper does not list specific software dependencies with their version numbers required to reproduce the experiments.
Experiment Setup Yes We used an Multi Layer Parceptron (MLP) of depth 3 and 100 hidden variables at each layer for both encoder and decoder. For activation function we used tanh. [...] We used an MLP of depth 3 and 500 hidden units at each layer for both the encoder and the decoder. [...] In particular, we initialized each weight in the first linear part of the embedding by N(0, 0.01). We treated the first 50 epochs as a burn-in phase and reduced the learning rate by a factor of 40 after the burn-in phase.