Hyperbolic Attention Networks
Authors: Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz Hermann, Peter Battaglia, Victor Bapst, David Raposo, Adam Santoro, Nando de Freitas
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our models on synthetic and real-world tasks. Experiments where the underlying graph structure is explicitly known clearly show the beneļ¬ts of using hyperbolic geometry as an inductive bias. |
| Researcher Affiliation | Academia | No explicit institutional affiliations or email domains are provided within the text of the paper to classify the authors' affiliations. |
| Pseudocode | No | The paper describes the methods textually and with equations but does not include any explicit pseudocode blocks or algorithms. |
| Open Source Code | Yes | We use a publicly available version: https://github.com/tensorflow/tensor2tensor |
| Open Datasets | Yes | We evaluate all the models on the WMT14 En-De dataset (Bojar et al., 2014). We use two of the standard graph transduction benchmark datasets, Citeseer and Cora (Sen et al., 2008). |
| Dataset Splits | No | The paper mentions generating training data and using test sets, but does not explicitly provide specific train/validation/test dataset splits (e.g., percentages, sample counts, or clear references to predefined splits within the paper's text). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Tensor2tensor' in a footnote, but it does not specify version numbers for this or any other software dependencies required to replicate the experiments. |
| Experiment Setup | Yes | We use models with 3 recursive self-attention layers, each of which has 4 heads with 4 units each for each of q, k, and v. |