Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Scalable Sobolev IPM for Probability Measures on a Graph
Authors: Tam Le, Truyen Nguyen, Hideitsu Hino, Kenji Fukumizu
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. Experiments In this section, we illustrate the fast computation for the regularized Sobolev IPM, which is comparable to the Sobolev transport (ST), and several-order faster than the standard optimal transport (OT) for measures on a graph. We then show preliminary evidences on the advantages of the regularized Sobolev IPM kernels to compare probability measures on a given graph under the same settings for document classification and for TDA. |
| Researcher Affiliation | Academia | 1Department of Advanced Data Science, The Institute of Statistical Mathematics (ISM), Tokyo, Japan 2The University of Akron, Ohio, US. Correspondence to: Tam Le <EMAIL>. |
| Pseudocode | No | The paper only describes methods in paragraph text and mathematical formulations. There are no clearly labeled pseudocode or algorithm blocks in the main text or appendices. |
| Open Source Code | Yes | Additionally, we have released code for our proposed approach.1 1The code repository is on https://github.com/ lttam/Sobolev-IPM. |
| Open Datasets | Yes | We consider 4 popular document datasets: TWITTER, RECIPE, CLASSIC, AMAZON... We consider orbit recognition on the synthesized Orbit dataset (Adams et al., 2017), and object classification on a 10-class subset of MPEG7 dataset (Latecki et al., 2000) as in Le et al. (2022). |
| Dataset Splits | Yes | We randomly split each dataset into 70%/30% for training and test respectively, with 10 repeats, and use 1-vs-1 strategy for SVM classification. |
| Hardware Specification | No | For computational devices, we run all of our experiments on commodity hardware. |
| Software Dependencies | No | The paper mentions using 'word2vec word embedding' and 'kernelized support vector machine (SVM)' but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | Typically, hyper-parameters are chosen via cross validation. Concretely, SVM regularization is chosen from {0.01, 0.1, 1, 10}, and kernel hyperparameter is chosen from {1/qs, 1/(2qs), 1/(5qs)} with s = 10, 20, . . . , 90, where we write qs for the s% quantile of a subset of corresponding distances on training set. |