Generalizing Point Embeddings using the Wasserstein Space of Elliptical Distributions
Authors: Boris Muzellec, Marco Cuturi
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the advantages of elliptical embeddings by using them for visualization, to compute embeddings of words, and to reflect entailment or hypernymy. |
| Researcher Affiliation | Collaboration | Boris Muzellec CREST, ENSAE boris.muzellec@ensae.fr Marco Cuturi Google Brain and CREST, ENSAE cuturi@google.com |
| Pseudocode | Yes | Algorithm 1 Newton-Schulz |
| Open Source Code | No | The paper does not provide explicit statements or links for the open-sourcing of the methodology code. |
| Open Datasets | Yes | We train our embeddings on the concatenated uk Wa C and Wa Ckypedia corpora [Baroni et al., 2009], consisting of about 3 billion tokens, on which we keep only the tokens appearing more than 100 times in the text (for a total number of 261583 different words). and PISA study2 http://pisadataexplorer.oecd.org/ide/idepisa/ |
| Dataset Splits | No | The paper mentions using various datasets for training and evaluation (e.g., 'concatenated uk Wa C and Wa Ckypedia corpora', 'similarity datasets', 'Entailment dataset') but does not specify the exact train/validation/test splits (e.g., percentages or sample counts) used for these datasets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions the use of 'adagrad' and 'Newton-Schulz iterations' but does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Given a set R of positive word/context pairs of words (w, c), and for each input word a set N(w) of n negative contexts words sampled randomly, we adapt Vilnis and Mc Callum s loss function to the W 2 2 distance to minimize the following hinge loss: 4M [µw : c] + 1 + where M > 0 is a margin parameter. We train our embeddings on the concatenated uk Wa C and Wa Ckypedia corpora [Baroni et al., 2009], consisting of about 3 billion tokens, on which we keep only the tokens appearing more than 100 times in the text (for a total number of 261583 different words). We train our embeddings using adagrad [Duchi et al., 2011], sampling one negative context per positive context and, in order to prevent the norms of the embeddings to be too highly correlated with the corresponding word frequencies (see Figure in supplementary material), we use two distinct sets of embeddings for the input and context words. |