Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Approximating 1-Wasserstein Distance with Trees

Authors: Makoto Yamada, Yuki Takezawa, Ryoma Sato, Han Bao, Zornitsa Kozareva, Sujith Ravi

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experiments, we demonstrate that the TWD can accurately approximate the original 1-Wasserstein distance by using the weight estimation technique. Our code can be found in https://github.com/oist/tree OT. ... We evaluated our proposed methods using the Twitter, BBCSport, and Amazon datasets. ... Tables 1,2, and 3 present the experimental results for the Twitter, BBCSport, and Amazon datasets, respectively. Evidently, the q TWD and c TWD can obtain a small MAE and accurately approximate the original 1-Wasserstein distance.
Researcher Affiliation	Collaboration	Makoto Yamada EMAIL Okinawa Institute of Science and Technology Kyoto University RIKEN AIP ... Zornitsa Kozareva EMAIL Meta AI ... Sujith Ravi EMAIL Slice X AI
Pseudocode	Yes	Algorithm 1 Sliced weight estimation with trees
Open Source Code	Yes	Through experiments, we demonstrate that the TWD can accurately approximate the original 1-Wasserstein distance by using the weight estimation technique. Our code can be found in https://github.com/oist/tree OT.
Open Datasets	Yes	We evaluated our proposed methods using the Twitter, BBCSport, and Amazon datasets 2. ... 2https://github.com/gaohuang/S-WMD
Dataset Splits	Yes	We evaluated the document classiﬁcation experiment tasks with the 1-Wasserstein distance (Sinkhorn algorithm), Quad Tree, Cluster Tree, q TWD, c TWD, and their sliced counterparts. For the proposed method, we set the regularization parameter to λ = 10 3. In this experiment, we used 90% of the samples for training and the remaining samples for the test. We then used the ten nearest neighbor classiﬁcation methods.
Hardware Specification	Yes	We evaluated all the methods using Xeon CPU E5-2690 v4 (2.60 GHz) and Xeon CPU E7-8890 v4 (2.20 GHz). ... For training, we measured the average computational cost using a Xeon CPU E5-2690 v4 (2.60 GHz) ... For the test, we ran all the methods with an A6000 GPU.
Software Dependencies	No	For the 1-Wasserstein distance, we used the Python optimal transport (POT) package 1. ... We used SPAMS to solve the Lasso problem 3. ... 1https://pythonot.github.io/index.html ... 3http://thoth.inrialpes.fr/people/mairal/spams/
Experiment Setup	Yes	For the Cluster Tree, we set the number of clusters K = 5 for all experiments. ... For the proposed methods, we selected the regularization parameter from {10 3, 10 2, 10 1}. ... In this experiment, we set the number of slices to T = 3 ... For the Sinkhorn algorithm, we set the regularization parameter with 10 2 and the maximum iteration with 100, respectively. ... For the proposed method, we set the regularization parameter as λ = 10 3. ... We then used the ten nearest neighbor classiﬁcation methods.