Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Tree-Sliced Variants of Wasserstein Distances

Authors: Tam Le, Makoto Yamada, Kenji Fukumizu, Marco Cuturi

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluated the proposed TSW kernel k TSW (Equation (5)) for comparing empirical measures in word embedding-based document classification and topological data analysis.
Researcher Affiliation Collaboration Tam Le RIKEN AIP, Japan EMAIL Makoto Yamada Kyoto University & RIKEN AIP, Japan EMAIL Kenji Fukumizu ISM, Japan & RIKEN AIP, Japan EMAIL Marco Cuturi Google Brain, Paris & CREST ENSAE EMAIL
Pseudocode Yes Algorithm 1 Partition_Tree_Metric(s, X, xs, h, HT)
Open Source Code Yes We have released code for these tools2. 2https://github.com/lttam/Tree Wasserstein.
Open Datasets Yes We evaluated k TSW on four datasets: TWITTER, RECIPE, CLASSIC and AMAZON, following the approach of Word Mover s distances [39], for document classification with SVM.
Dataset Splits Yes For SVM, we randomly split each dataset into 70%/30% for training and test with 100 repeats, choose hyper-parameters through cross validation, choose 1/t from {1, q10, q20, q50} where qs is the s% quantile of a subset of corresponding distances, observed on a training set, use one-vs-one strategy with Libsvm [12] for multi-class classification, and choose SVM regularization from 10 2:1:2 .
Hardware Specification Yes We ran experiments with Intel Xeon CPU E7-8891v3 (2.80GHz), and 256GB RAM.
Software Dependencies No The paper mentions 'Libsvm [12]' but does not provide specific version numbers for software dependencies needed for replication.
Experiment Setup Yes For SVM, we randomly split each dataset into 70%/30% for training and test with 100 repeats, choose hyper-parameters through cross validation, choose 1/t from {1, q10, q20, q50} where qs is the s% quantile of a subset of corresponding distances, observed on a training set, use one-vs-one strategy with Libsvm [12] for multi-class classification, and choose SVM regularization from 10 2:1:2 .