Learning Embeddings into Entropic Wasserstein Spaces

Authors: Charlie Frogner, Farzaneh Mirzazadeh, Justin Solomon

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We examine empirically the representational capacity of our learned Wasserstein embeddings, showing that they can embed a wide variety of metric structures with smaller distortion than an equivalent Euclidean embedding. We empirically investigate two settings for Wasserstein embeddings.
Researcher Affiliation Collaboration Charlie Frogner MIT CSAIL and MIT-IBM Watson AI Lab frogner@mit.edu Farzaneh Mirzazadeh MIT-IBM Watson AI Lab and IBM Research farzaneh@ibm.com Justin Solomon MIT CSAIL and MIT-IBM Watson AI Lab jsolomon@mit.edu
Pseudocode No The paper describes iterative processes and mathematical formulations, such as Equation 5 for Sinkhorn iteration, but it does not provide any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statement about making its source code publicly available or provide a link to a code repository.
Open Datasets Yes The training dataset is Text84, which consists of a corpus with 17M tokens from Wikipedia and is commonly used as a language modeling benchmark. [4] From http://mattmahoney.net/dc/text8.zip
Dataset Splits No The paper mentions using Text8 as a 'training dataset' and evaluating on 'benchmark retrieval tasks' but does not specify the train/validation/test splits or a cross-validation strategy for its own experimental setup.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions using the Adam optimizer, but it does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes We choose a vocabulary of 8000 words and a context window size of l = 2 (i.e., 2 words on each side), λ = 0.05, number of epochs of 3, negative sampling rate of 1 per positive and Adam (Kingma & Ba, 2014) for optimization.