Hierarchical Density Order Embeddings
Authors: Ben Athiwaratkun, Andrew Gordon Wilson
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach provides state-of-the-art performance on the WORDNET hypernym relationship prediction task and the challenging HYPERLEX lexical entailment dataset while retaining a rich and interpretable probabilistic representation. We show quantitative results on the WORDNET Hypernym prediction task in Section 4.2 and a graded entailment dataset HYPERLEX in Section 4.4. |
| Researcher Affiliation | Academia | Ben Athiwaratkun, Andrew Gordon Wilson Cornell University Ithaca, NY 14850, USA |
| Pseudocode | No | The paper describes methods and equations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We make our code publicly available.1 1https://github.com/benathi/density-order-emb |
| Open Datasets | Yes | We have a similar data setup to the experiment by Vendrov et al. (2015) where we use the transitive closure of WORDNET noun hypernym relationships which contains 82, 115 synsets and 837, 888 hypernym pairs from 84, 427 direct hypernym edges. We obtain the data using the WORDNET API of NLTK version 3.2.1 (Loper & Bird, 2002). |
| Dataset Splits | Yes | The validation set contains 4000 true hypernym relationships as well as 4000 false hypernym relationships where the false hypernym relationships are constructed from the S1 negative sampling described in Section 3.5. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | Yes | We obtain the data using the WORDNET API of NLTK version 3.2.1 (Loper & Bird, 2002). We use the Adam optimizer (Kingma & Ba, 2014). |
| Experiment Setup | Yes | We use d = 50 as the default dimension... We initialize the mean vectors to have a unit norm and normalize the mean vectors in the training graph. We initialize the diagonal variance components to be all equal to β and optimize on the unconstrained space of log(Σ). We use a minibatch size of 500 true hypernym pairs... We use the Adam optimizer (Kingma & Ba, 2014) and train our model for at most 20 epochs. The hyperparameters are the loss margin m, the initial variance scale β, and the energy threshold γ. |