Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On efficient multilevel Clustering via Wasserstein distances

Authors: Viet Huynh, Nhat Ho, Nhan Dam, XuanLong Nguyen, Mikhail Yurochkin, Hung Bui, Dinh Phung

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, experimental results with both synthetic and real data are presented to demonstrate the ﬂexibility and scalability of the proposed approach.
Researcher Affiliation	Collaboration	Viet Huynh EMAIL Faculty of Information Technology, Monash University; Nhat Ho EMAIL Department of Statistics and Data Sciences, University of Texas, Austin; Nhan Dam EMAIL Faculty of Information Technology, Monash University; Xuan Long Nguyen EMAIL Department of Statistics, University of Michigan; Mikhail Yurochkin EMAIL IBM Research; Hung Bui EMAIL Vin AI Research; Dinh Phung EMAIL Faculty of Information Technology, Monash University
Pseudocode	Yes	Algorithm 1 Multilevel Wasserstein Means (MWM); Algorithm 2 Multilevel Wasserstein Means with Sharing (MWMS); Algorithm 3 Multilevel Wasserstein Means with Context (MWMC); Algorithm 4 Multilevel Wasserstein Geometric Median (MWGM); Algorithm 5 Map Reduce for Multilevel Wasserstein Means (MWM); Algorithm 6 Wasserstein barycenter under the entropic version of W1 metric; Algorithm 7 Smoothed Primal T γ and Dual b γ Optima; Algorithm 8 Fix-support Wasserstein barycenter; Algorithm 9 Free-support Wasserstein barycenter
Open Source Code	Yes	Code is available at https://github.com/viethhuynh/wasserstein-means
Open Datasets	Yes	Label Me dataset2 consists of 2, 688 annotated images... 2. http://labelme.csail.mit.edu; Student Life dataset3 is a large dataset... 3. https://studentlife.cs.dartmouth.edu/dataset.html
Dataset Splits	No	The paper uses synthetic and real-world datasets for empirical studies, but does not explicitly provide training/test/validation dataset splits, percentages, or cross-validation methodologies for reproducing experiments. It describes generating synthetic data and using filtered real-world datasets (1,800 images from Label Me, 49 documents from Student Life) for analysis without specifying how these are partitioned for model evaluation.
Hardware Specification	Yes	All experiments are conducted on the same machine (Windows 10 64-bit, core i7 3.4GHz CPU and 16GB RAM).
Software Dependencies	No	The paper mentions using "Apache Spark framework" for parallel implementation and "GPU implementation of Cuturi's algorithms", but does not provide specific version numbers for these or any other key software dependencies required for reproducibility.
Experiment Setup	Yes	In our experiments, we used a ﬁxed value of all entropic regularization parameters τ = 10. For the regularized term λ, we heuristically choose to balance global and local terms, i.e., λ W 2 2 (H,1/m Pm j=1 δGj )/Pm j=1 W 2 2 (Gj,P j nj ).