Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Scalable Approximation Algorithms for $p$-Wasserstein Distance and Its Variants

Authors: Nathaniel Lahn, Sharath Raghvendra, Emma Saarinen, Pouyan Shirzadian

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present the empirical analysis of both our tree-based distance from Section 2.2 as well as the algorithm from (Section 3). All computations are performed on a computer with a 2.6 GHz 6-Core Intel Core i7 CPU and 16 GB RAM, using a single calculation thread. The code is available at https://github.com/pouyansh/Faster-p Wasserstein. ... Results: We present our results in three parts: Tree Distances Accuracy: For two sets of n points from the Uniform dataset, we compute the average distortion of the tree distance across all pairs of points. As depicted in Figure 1(f), the average distortion does not significantly change as a function of n, while it increases exponentially as p increases, which is in line with our result in Lemma 2.1. Algorithm s Accuracy: For a given value of n and p and both datasets, we executed our algorithm on n samples A and B from the dataset and computed the ratio of our matching cost to the optimal matching cost. For each value of n and p, we report the average ratio over 15 executions. As shown in Figure 1(a) and (d), as n grows, the approximation factors slightly increase; however, the approximation factors do not change as a function of p.
Researcher Affiliation	Academia	Nathaniel Lahn 1 Sharath Raghvendra 2 Emma Saarinen 2 Pouyan Shirzadian 3 Authors are listed Alphabetically 1Radford University 2North Carolina State University 3Virginia Tech. Correspondence to: N. Lahn <EMAIL>, S. Raghvendra <EMAIL>, E. Saarinen <EMAIL>, P. Shirzadian <EMAIL>.
Pseudocode	No	The paper describes algorithms like the Hungarian algorithm and modifications to the LMR algorithm in natural language and mathematical notation within sections such as "3.2. Hungarian Algorithm" and "4.2. Algorithm for RPW distance", but does not feature a distinct, labeled pseudocode block or algorithm figure.
Open Source Code	Yes	The code is available at https://github.com/pouyansh/Faster-p Wasserstein. ... Implementation can be found at https://github.com/saarinenemma/ Faster RPW.
Open Datasets	No	Datasets: We perform experiments on two datasets, namely (i) samples drawn from the uniform distribution inside the 10-dimensional unit hypercube (Uniform dataset), and (ii) samples drawn from a truncated normal distribution inside the 10-dimensional unit hypercube (Normal dataset).
Dataset Splits	Yes	For two sets of n points from the Uniform dataset, we compute the average distortion of the tree distance across all pairs of points. ... For two sets of n points, one sampled from the left half of the unit square and the second drawn from the right, we execute both LMR-RPW and our RPW algorithms to find the p-RPW distance between the datasets.
Hardware Specification	Yes	All computations are performed on a computer with a 2.6 GHz 6-Core Intel Core i7 CPU and 16 GB RAM, using a single calculation thread. ... All experiments were performed on North Carolina State University s computing cluster (ARC), on a single calculation CPU thread.
Software Dependencies	No	The paper states that the code is available at GitHub but does not list specific software libraries or their version numbers (e.g., Python, PyTorch, numpy, etc.).
Experiment Setup	Yes	For a given value of n and p and both datasets, we executed our algorithm on n samples A and B from the dataset and computed the ratio of our matching cost to the optimal matching cost. For each value of n and p, we report the average ratio over 15 executions. ... For p = 4, we varied the number of HSTs m from 1 to 6 and measured the approximation ratio of the computed matching. ... For δ = 0.1 and any n {1000, 2000, 5000}, we compute the p-RPW distance between n samples from the left half and n samples from the right half of the unit square for increasing values of p. ... For n = 1000 and p = 3, we compute the p-RPW distance between two sets of n samples drawn from each half of the unit square for δ {0.1, 0.08, 0.06, 0.04}.