Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Approximating 1-Wasserstein Distance with Trees
Authors: Makoto Yamada, Yuki Takezawa, Ryoma Sato, Han Bao, Zornitsa Kozareva, Sujith Ravi
TMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments, we demonstrate that the TWD can accurately approximate the original 1-Wasserstein distance by using the weight estimation technique. Our code can be found in https://github.com/oist/tree OT. ... We evaluated our proposed methods using the Twitter, BBCSport, and Amazon datasets. ... Tables 1,2, and 3 present the experimental results for the Twitter, BBCSport, and Amazon datasets, respectively. Evidently, the q TWD and c TWD can obtain a small MAE and accurately approximate the original 1-Wasserstein distance. |
| Researcher Affiliation | Collaboration | Makoto Yamada EMAIL Okinawa Institute of Science and Technology Kyoto University RIKEN AIP ... Zornitsa Kozareva EMAIL Meta AI ... Sujith Ravi EMAIL Slice X AI |
| Pseudocode | Yes | Algorithm 1 Sliced weight estimation with trees |
| Open Source Code | Yes | Through experiments, we demonstrate that the TWD can accurately approximate the original 1-Wasserstein distance by using the weight estimation technique. Our code can be found in https://github.com/oist/tree OT. |
| Open Datasets | Yes | We evaluated our proposed methods using the Twitter, BBCSport, and Amazon datasets 2. ... 2https://github.com/gaohuang/S-WMD |
| Dataset Splits | Yes | We evaluated the document classification experiment tasks with the 1-Wasserstein distance (Sinkhorn algorithm), Quad Tree, Cluster Tree, q TWD, c TWD, and their sliced counterparts. For the proposed method, we set the regularization parameter to λ = 10 3. In this experiment, we used 90% of the samples for training and the remaining samples for the test. We then used the ten nearest neighbor classification methods. |
| Hardware Specification | Yes | We evaluated all the methods using Xeon CPU E5-2690 v4 (2.60 GHz) and Xeon CPU E7-8890 v4 (2.20 GHz). ... For training, we measured the average computational cost using a Xeon CPU E5-2690 v4 (2.60 GHz) ... For the test, we ran all the methods with an A6000 GPU. |
| Software Dependencies | No | For the 1-Wasserstein distance, we used the Python optimal transport (POT) package 1. ... We used SPAMS to solve the Lasso problem 3. ... 1https://pythonot.github.io/index.html ... 3http://thoth.inrialpes.fr/people/mairal/spams/ |
| Experiment Setup | Yes | For the Cluster Tree, we set the number of clusters K = 5 for all experiments. ... For the proposed methods, we selected the regularization parameter from {10 3, 10 2, 10 1}. ... In this experiment, we set the number of slices to T = 3 ... For the Sinkhorn algorithm, we set the regularization parameter with 10 2 and the maximum iteration with 100, respectively. ... For the proposed method, we set the regularization parameter as λ = 10 3. ... We then used the ten nearest neighbor classification methods. |