Generalizing Point Embeddings using the Wasserstein Space of Elliptical Distributions

Authors: Boris Muzellec, Marco Cuturi

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the advantages of elliptical embeddings by using them for visualization, to compute embeddings of words, and to reflect entailment or hypernymy.
Researcher Affiliation Collaboration Boris Muzellec CREST, ENSAE boris.muzellec@ensae.fr Marco Cuturi Google Brain and CREST, ENSAE cuturi@google.com
Pseudocode Yes Algorithm 1 Newton-Schulz
Open Source Code No The paper does not provide explicit statements or links for the open-sourcing of the methodology code.
Open Datasets Yes We train our embeddings on the concatenated uk Wa C and Wa Ckypedia corpora [Baroni et al., 2009], consisting of about 3 billion tokens, on which we keep only the tokens appearing more than 100 times in the text (for a total number of 261583 different words). and PISA study2 http://pisadataexplorer.oecd.org/ide/idepisa/
Dataset Splits No The paper mentions using various datasets for training and evaluation (e.g., 'concatenated uk Wa C and Wa Ckypedia corpora', 'similarity datasets', 'Entailment dataset') but does not specify the exact train/validation/test splits (e.g., percentages or sample counts) used for these datasets.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions the use of 'adagrad' and 'Newton-Schulz iterations' but does not specify version numbers for these or any other software dependencies.
Experiment Setup Yes Given a set R of positive word/context pairs of words (w, c), and for each input word a set N(w) of n negative contexts words sampled randomly, we adapt Vilnis and Mc Callum s loss function to the W 2 2 distance to minimize the following hinge loss: 4M [µw : c] + 1 + where M > 0 is a margin parameter. We train our embeddings on the concatenated uk Wa C and Wa Ckypedia corpora [Baroni et al., 2009], consisting of about 3 billion tokens, on which we keep only the tokens appearing more than 100 times in the text (for a total number of 261583 different words). We train our embeddings using adagrad [Duchi et al., 2011], sampling one negative context per positive context and, in order to prevent the norms of the embeddings to be too highly correlated with the corresponding word frequencies (see Figure in supplementary material), we use two distinct sets of embeddings for the input and context words.