Learning Embeddings into Entropic Wasserstein Spaces
Authors: Charlie Frogner, Farzaneh Mirzazadeh, Justin Solomon
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We examine empirically the representational capacity of our learned Wasserstein embeddings, showing that they can embed a wide variety of metric structures with smaller distortion than an equivalent Euclidean embedding. We empirically investigate two settings for Wasserstein embeddings. |
| Researcher Affiliation | Collaboration | Charlie Frogner MIT CSAIL and MIT-IBM Watson AI Lab frogner@mit.edu Farzaneh Mirzazadeh MIT-IBM Watson AI Lab and IBM Research farzaneh@ibm.com Justin Solomon MIT CSAIL and MIT-IBM Watson AI Lab jsolomon@mit.edu |
| Pseudocode | No | The paper describes iterative processes and mathematical formulations, such as Equation 5 for Sinkhorn iteration, but it does not provide any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statement about making its source code publicly available or provide a link to a code repository. |
| Open Datasets | Yes | The training dataset is Text84, which consists of a corpus with 17M tokens from Wikipedia and is commonly used as a language modeling benchmark. [4] From http://mattmahoney.net/dc/text8.zip |
| Dataset Splits | No | The paper mentions using Text8 as a 'training dataset' and evaluating on 'benchmark retrieval tasks' but does not specify the train/validation/test splits or a cross-validation strategy for its own experimental setup. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using the Adam optimizer, but it does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | We choose a vocabulary of 8000 words and a context window size of l = 2 (i.e., 2 words on each side), λ = 0.05, number of epochs of 3, negative sampling rate of 1 per positive and Adam (Kingma & Ba, 2014) for optimization. |