Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fast Computation of Wasserstein Barycenters

Authors: Marco Cuturi, Arnaud Doucet

ICML 2014 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use these algorithms to visualize a large family of images and to solve a constrained clustering problem.
Researcher Affiliation Academia Marco Cuturi EMAIL Graduate School of Informatics, Kyoto University Arnaud Doucet EMAIL Department of Statistics, University of Oxford
Pseudocode Yes Algorithm 1 Wasserstein Barycenter in P(X, Θ), Algorithm 2 2-Wasserstein Barycenter in Pk(Rd, Θ), Algorithm 3 Smoothed Primal Tλ and Dual α
Open Source Code No The paper does not provide any links to its own source code or explicitly state that the code is being released.
Open Datasets Yes We use 50.000 images of the MNIST database, with approximately 5.000 images for each digit from 0 to 9.
Dataset Splits No The paper describes using the MNIST database and US census data, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts for each split) for model training or evaluation in the traditional machine learning sense.
Hardware Specification Yes Using a Quadro K5000 GPU with close to 1500 cores, the computation of a single barycenter takes about 2 hours to reach 100 iterations. [...] On a single CPU core, these computations require 12.5 seconds for the constrained case, using Sinkhorn s approximation, and 1.55 seconds for the regular k-means algorithm.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., specific programming languages, libraries, or solvers with their versions).
Experiment Setup Yes Ī» is set to 60/median(M), where M is the squared-Euclidean distance matrix between all 2,500 pixels in the grid. [...] display intermediate barycenter solutions for each of these 10 datasets of images for t = 1, 10, 60 gradient iterations.