Hyperbolic Attention Networks

Authors: Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz Hermann, Peter Battaglia, Victor Bapst, David Raposo, Adam Santoro, Nando de Freitas

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our models on synthetic and real-world tasks. Experiments where the underlying graph structure is explicitly known clearly show the benefits of using hyperbolic geometry as an inductive bias.
Researcher Affiliation Academia No explicit institutional affiliations or email domains are provided within the text of the paper to classify the authors' affiliations.
Pseudocode No The paper describes the methods textually and with equations but does not include any explicit pseudocode blocks or algorithms.
Open Source Code Yes We use a publicly available version: https://github.com/tensorflow/tensor2tensor
Open Datasets Yes We evaluate all the models on the WMT14 En-De dataset (Bojar et al., 2014). We use two of the standard graph transduction benchmark datasets, Citeseer and Cora (Sen et al., 2008).
Dataset Splits No The paper mentions generating training data and using test sets, but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or clear references to predefined splits within the paper's text).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory specifications) used for running the experiments.
Software Dependencies No The paper mentions using 'Tensor2tensor' in a footnote, but it does not specify version numbers for this or any other software dependencies required to replicate the experiments.
Experiment Setup Yes We use models with 3 recursive self-attention layers, each of which has 4 heads with 4 units each for each of q, k, and v.