A Wrapped Normal Distribution on Hyperbolic Space for Gradient-Based Learning
Authors: Yoshihiro Nagano, Shoichiro Yamaguchi, Yasuhiro Fujita, Masanori Koyama
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the efficacy of our distribution on various datasets including MNIST, Atari 2600 Breakout, and Word Net. |
| Researcher Affiliation | Collaboration | 1Department of Complexity Science and Engineering, The University of Tokyo, Japan 2Preferred Networks, Inc., Japan. |
| Pseudocode | Yes | Algorithm 1 is an algorithmic description of the sampling procedure based on our construction. |
| Open Source Code | No | The paper does not contain any statement about releasing source code or a link to a repository for the described methodology. |
| Open Datasets | Yes | We applied Hyperbolic VAE to a binarized version of MNIST. [...] We trained probabilistic word embedding models with Word Net nouns dataset (Miller, 1998) |
| Dataset Splits | Yes | We amassed a set of trajectories whose total length is 100,000, of which we used 80,000 as the training set, 10,000 as the validation set, and 10,000 as the test set. |
| Hardware Specification | No | The paper does not mention any specific hardware details such as GPU or CPU models, or cloud computing specifications used for running experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with their version numbers required to reproduce the experiments. |
| Experiment Setup | Yes | We used an Multi Layer Parceptron (MLP) of depth 3 and 100 hidden variables at each layer for both encoder and decoder. For activation function we used tanh. [...] We used an MLP of depth 3 and 500 hidden units at each layer for both the encoder and the decoder. [...] In particular, we initialized each weight in the first linear part of the embedding by N(0, 0.01). We treated the first 50 epochs as a burn-in phase and reduced the learning rate by a factor of 40 after the burn-in phase. |