Word Representations via Gaussian Embedding
Authors: Luke Vilnis and Andrew McCallum
ICLR 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare performance on various word embedding benchmarks, investigate the ability of these embeddings to model entailment and other asymmetric relationships, and explore novel properties of the representation. |
| Researcher Affiliation | Academia | Luke Vilnis, Andrew Mc Callum School of Computer Science University of Massachusetts Amherst Amherst, MA 01003 luke@cs.umass.edu, mccallum@cs.umass.edu |
| Pseudocode | No | The paper provides mathematical formulations and derivations but does not include pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | Unsupervised embeddings are learned on the concatenated uk Wa C and Wa Ckypedia corpora (Baroni et al., 2009), consisting of about 3 billion tokens. |
| Dataset Splits | No | The paper mentions training and testing, and evaluation on benchmarks, but does not specify a separate validation dataset or explicit train/validation/test splits with percentages or sample counts. While it mentions |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using |
| Experiment Setup | Yes | All Gaussian experiments are conducted with 50-dimensional vectors, with diagonal variances except where noted otherwise. We train both models with one pass over the data, using separate embeddings for the input and output contexts, 1 negative sample per positive example, and the same subsampling procedure as in the word2vec paper (Mikolov et al., 2013). The only other difference between the two training regimes is that we use a smaller ℓ2 regularization constraint when using the word2vec loss function. |