Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Embeddings into Entropic Wasserstein Spaces
Authors: Charlie Frogner, Farzaneh Mirzazadeh, Justin Solomon
ICLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We examine empirically the representational capacity of our learned Wasserstein embeddings, showing that they can embed a wide variety of metric structures with smaller distortion than an equivalent Euclidean embedding. We empirically investigate two settings for Wasserstein embeddings. |
| Researcher Affiliation | Collaboration | Charlie Frogner MIT CSAIL and MIT-IBM Watson AI Lab EMAIL Farzaneh Mirzazadeh MIT-IBM Watson AI Lab and IBM Research EMAIL Justin Solomon MIT CSAIL and MIT-IBM Watson AI Lab EMAIL |
| Pseudocode | No | The paper describes iterative processes and mathematical formulations, such as Equation 5 for Sinkhorn iteration, but it does not provide any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statement about making its source code publicly available or provide a link to a code repository. |
| Open Datasets | Yes | The training dataset is Text84, which consists of a corpus with 17M tokens from Wikipedia and is commonly used as a language modeling benchmark. [4] From http://mattmahoney.net/dc/text8.zip |
| Dataset Splits | No | The paper mentions using Text8 as a 'training dataset' and evaluating on 'benchmark retrieval tasks' but does not specify the train/validation/test splits or a cross-validation strategy for its own experimental setup. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using the Adam optimizer, but it does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | We choose a vocabulary of 8000 words and a context window size of l = 2 (i.e., 2 words on each side), λ = 0.05, number of epochs of 3, negative sampling rate of 1 per positive and Adam (Kingma & Ba, 2014) for optimization. |