Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Scalable Sobolev IPM for Probability Measures on a Graph

Authors: Tam Le, Truyen Nguyen, Hideitsu Hino, Kenji Fukumizu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6. Experiments In this section, we illustrate the fast computation for the regularized Sobolev IPM, which is comparable to the Sobolev transport (ST), and several-order faster than the standard optimal transport (OT) for measures on a graph. We then show preliminary evidences on the advantages of the regularized Sobolev IPM kernels to compare probability measures on a given graph under the same settings for document classiﬁcation and for TDA.
Researcher Affiliation	Academia	1Department of Advanced Data Science, The Institute of Statistical Mathematics (ISM), Tokyo, Japan 2The University of Akron, Ohio, US. Correspondence to: Tam Le <EMAIL>.
Pseudocode	No	The paper only describes methods in paragraph text and mathematical formulations. There are no clearly labeled pseudocode or algorithm blocks in the main text or appendices.
Open Source Code	Yes	Additionally, we have released code for our proposed approach.1 1The code repository is on https://github.com/ lttam/Sobolev-IPM.
Open Datasets	Yes	We consider 4 popular document datasets: TWITTER, RECIPE, CLASSIC, AMAZON... We consider orbit recognition on the synthesized Orbit dataset (Adams et al., 2017), and object classiﬁcation on a 10-class subset of MPEG7 dataset (Latecki et al., 2000) as in Le et al. (2022).
Dataset Splits	Yes	We randomly split each dataset into 70%/30% for training and test respectively, with 10 repeats, and use 1-vs-1 strategy for SVM classiﬁcation.
Hardware Specification	No	For computational devices, we run all of our experiments on commodity hardware.
Software Dependencies	No	The paper mentions using 'word2vec word embedding' and 'kernelized support vector machine (SVM)' but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	Typically, hyper-parameters are chosen via cross validation. Concretely, SVM regularization is chosen from {0.01, 0.1, 1, 10}, and kernel hyperparameter is chosen from {1/qs, 1/(2qs), 1/(5qs)} with s = 10, 20, . . . , 90, where we write qs for the s% quantile of a subset of corresponding distances on training set.