reproducibilityindex.ai

Generalizing Point Embeddings using the Wasserstein Space of Elliptical Distributions

Authors: Boris Muzellec, Marco Cuturi

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the advantages of elliptical embeddings by using them for visualization, to compute embeddings of words, and to reﬂect entailment or hypernymy.
Researcher Affiliation	Collaboration	Boris Muzellec CREST, ENSAE boris.muzellec@ensae.fr Marco Cuturi Google Brain and CREST, ENSAE cuturi@google.com
Pseudocode	Yes	Algorithm 1 Newton-Schulz
Open Source Code	No	The paper does not provide explicit statements or links for the open-sourcing of the methodology code.
Open Datasets	Yes	We train our embeddings on the concatenated uk Wa C and Wa Ckypedia corpora [Baroni et al., 2009], consisting of about 3 billion tokens, on which we keep only the tokens appearing more than 100 times in the text (for a total number of 261583 different words). and PISA study2 http://pisadataexplorer.oecd.org/ide/idepisa/
Dataset Splits	No	The paper mentions using various datasets for training and evaluation (e.g., 'concatenated uk Wa C and Wa Ckypedia corpora', 'similarity datasets', 'Entailment dataset') but does not specify the exact train/validation/test splits (e.g., percentages or sample counts) used for these datasets.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions the use of 'adagrad' and 'Newton-Schulz iterations' but does not specify version numbers for these or any other software dependencies.
Experiment Setup	Yes	Given a set R of positive word/context pairs of words (w, c), and for each input word a set N(w) of n negative contexts words sampled randomly, we adapt Vilnis and Mc Callum s loss function to the W 2 2 distance to minimize the following hinge loss: 4M [µw : c] + 1 + where M > 0 is a margin parameter. We train our embeddings on the concatenated uk Wa C and Wa Ckypedia corpora [Baroni et al., 2009], consisting of about 3 billion tokens, on which we keep only the tokens appearing more than 100 times in the text (for a total number of 261583 different words). We train our embeddings using adagrad [Duchi et al., 2011], sampling one negative context per positive context and, in order to prevent the norms of the embeddings to be too highly correlated with the corresponding word frequencies (see Figure in supplementary material), we use two distinct sets of embeddings for the input and context words.