reproducibilityindex.ai

Poincare Glove: Hyperbolic Word Embeddings

Authors: Alexandru Tifrea*, Gary Becigneul*, Octavian-Eugen Ganea*

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, based on extensive experiments, we prove that our embeddings, trained unsupervised, are the ﬁrst to simultaneously outperform strong and popular baselines on the tasks of similarity, analogy and hypernymy detection.
Researcher Affiliation	Academia	Alexandru T, ifrea , Gary B ecigneul , Octavian-Eugen Ganea Department of Computer Science ETH Z urich, Switzerland tifreaa@ethz.ch,{gary.becigneul,octavian.ganea}@inf.ethz.ch
Pseudocode	Yes	Algorithm 1 is-a(v, w) hypernymy score using Poincar e embeddings
Open Source Code	Yes	Our code is publicly available4. 4https://github.com/alex-tifrea/poincare_glove
Open Datasets	Yes	We trained all models on a corpus provided by Levy & Goldberg (2014); Levy et al. (2015) used in other word embeddings related work. Corpus preprocessing is explained in the above references. The dataset has been obtained from an English Wikipedia dump and contains 1.4 billion tokens.
Dataset Splits	Yes	In order to select the best t without overﬁtting on the benchmark dataset, we used the same 2-fold cross-validation method used by (Levy et al., 2015, section 5.1) (see our Table 15) which resulted in selecting t = 0.3. We report our main results in Table 4, and more extensive experiments in various settings (including in lower dimensions) in appendix A.2.
Hardware Specification	No	The paper does not specify the hardware used for training or experimentation, such as specific CPU/GPU models, memory, or cloud instance types.
Software Dependencies	No	The paper mentions optimizers like ADAGRAD and RADAGRAD but does not provide specific version numbers for any software dependencies (e.g., Python, TensorFlow, PyTorch, or specific library versions).
Experiment Setup	Yes	All models were trained for 50 epochs, and unless stated otherwise, on the full corpus of 189,533 word types. ... For the Euclidean baseline as well as for models with h(x) = x2 we used a learning rate of 0.05. For Poincar e models with h(x) = cosh2(x) we used a learning rate of 0.01.