Hierarchical Density Order Embeddings

Authors: Ben Athiwaratkun, Andrew Gordon Wilson

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach provides state-of-the-art performance on the WORDNET hypernym relationship prediction task and the challenging HYPERLEX lexical entailment dataset while retaining a rich and interpretable probabilistic representation. We show quantitative results on the WORDNET Hypernym prediction task in Section 4.2 and a graded entailment dataset HYPERLEX in Section 4.4.
Researcher Affiliation Academia Ben Athiwaratkun, Andrew Gordon Wilson Cornell University Ithaca, NY 14850, USA
Pseudocode No The paper describes methods and equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We make our code publicly available.1 1https://github.com/benathi/density-order-emb
Open Datasets Yes We have a similar data setup to the experiment by Vendrov et al. (2015) where we use the transitive closure of WORDNET noun hypernym relationships which contains 82, 115 synsets and 837, 888 hypernym pairs from 84, 427 direct hypernym edges. We obtain the data using the WORDNET API of NLTK version 3.2.1 (Loper & Bird, 2002).
Dataset Splits Yes The validation set contains 4000 true hypernym relationships as well as 4000 false hypernym relationships where the false hypernym relationships are constructed from the S1 negative sampling described in Section 3.5.
Hardware Specification No The paper does not provide specific details about the hardware used, such as GPU models, CPU types, or memory specifications.
Software Dependencies Yes We obtain the data using the WORDNET API of NLTK version 3.2.1 (Loper & Bird, 2002). We use the Adam optimizer (Kingma & Ba, 2014).
Experiment Setup Yes We use d = 50 as the default dimension... We initialize the mean vectors to have a unit norm and normalize the mean vectors in the training graph. We initialize the diagonal variance components to be all equal to β and optimize on the unconstrained space of log(Σ). We use a minibatch size of 500 true hypernym pairs... We use the Adam optimizer (Kingma & Ba, 2014) and train our model for at most 20 epochs. The hyperparameters are the loss margin m, the initial variance scale β, and the energy threshold γ.