Hyperbolic Neural Networks++

Authors: Ryohei Shimizu, YUSUKE Mukuta, Tatsuya Harada

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show the superior parameter efficiency of our methods compared to conventional hyperbolic components, and stability and outperformance over their Euclidean counterparts.In this section, we evaluate our methods in comparisons with HNNs and Euclidean counterparts.
Researcher Affiliation Academia Ryohei Shimizu1, Yusuke Mukuta1,2, Tatsuya Harada1,2 1The University of Tokyo 2RIKEN AIP
Pseudocode No The paper describes methods through mathematical formulations and textual descriptions but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes 1The code is available at https://github.com/mil-tokyo/hyperbolic_nn_plusplus.
Open Datasets Yes We pre-trained the Poincaré embeddings of the same dimensions as the experimental settings in HNNs, i.e., two, three, five, and ten dimensions, using the open-source implementation2 to extract several sub-trees whose root nodes are certain abstract hypernymies, e.g., animal. For each sub-tree, MLR layers learn the binary classification to predict whether each given node is included. All nodes are divided into 80% training nodes and 20% testing nodes. Footnote 2: https://github.com/facebookresearch/poincare-embeddings
Dataset Splits No All nodes are divided into 80% training nodes and 20% testing nodes.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory specifications, or cloud instances) used for running its experiments.
Software Dependencies No The implementation of hyperbolic architectures is based on the Geoopt (Kochurov et al., 2020). and We follow the open-source implementation of Fairseq (Ott et al., 2019).
Experiment Setup Yes We trained each model for 30 epochs using Riemannian Adam (Becigneul & Ganea, 2019) with a learning rate of 0.001 and a batch size of 16. For the scheduling of the learning rate η, we linearly increased the learning rate for the first 4000 iterations as a warm-up, and utilized the inverse square root decay with respect to the number of iterations t thereafter as η = (Dt) 1/2.