Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Optimal Transport for structured data with application on graphs
Authors: Vayer Titouan, Nicolas Courty, Romain Tavenard, Chapel Laetitia, Rémi Flamary
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now illustrate the behaviour of FGW on synthetic and real datasets. The algorithms presented in the previous section have been implemented using the Python Optimal Transport toolbox (Flamary & Courty, 2017) and will be released upon publication.4.2. Graph-structured data classificationTable 1. Average classification accuracy on the graph datasets with vector attributes. |
| Researcher Affiliation | Academia | 1Univ. Bretagne-Sud, CNRS, IRISA, F-56000 Vannes 2Univ. Cˆote d Azur, CNRS, OCA Lagrange, F-06000 Nice 3Univ. Rennes, CNRS, LETG, F-35000 Rennes. |
| Pseudocode | Yes | Algorithm 1 Conditional Gradient (CG) for FGWAlgorithm 2 Line-search for CG (q = 2) |
| Open Source Code | No | The algorithms presented in the previous section have been implemented using the Python Optimal Transport toolbox (Flamary & Courty, 2017) and will be released upon publication. |
| Open Datasets | Yes | BZR, COX2 (Sutherland et al., 2003), PROTEINS, ENZYMES (Borgwardt & Kriegel, 2005), CUNEIFORM (Kriege et al., 2018) and SYNTHETIC (Feragen et al., 2013) are vector attributed graphs. MUTAG (Debnath et al., 1991), PTC-MR (Kriege et al., 2016) and NCI1 (Wale et al., 2008) contain graphs with discrete attributes derived from small molecules. IMDB-B, IMDB-M (Yanardag & Vishwanathan, 2015) contain unlabeled graphs derived from social networks. All datas are available in (Kersting et al., 2016). |
| Dataset Splits | Yes | To provide compare between the methods, most papers about graph classification usually perform a nested cross validation (using 9 folds for training, 1 for testing, and reporting the average accuracy of this experiment repeated 10 times) and report accuracies of the other methods taken from the original papers. |
| Hardware Specification | Yes | We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X GPU used for this research. |
| Software Dependencies | No | The algorithms presented in the previous section have been implemented using the Python Optimal Transport toolbox (Flamary & Courty, 2017) and will be released upon publication. |
| Experiment Setup | Yes | For all methods using SVM, we cross validate the parameter C {10^-7, 10^-6, ..., 10^7}. The range of the WL parameter H is {0, 1..., 10}, and we also compute this kernel with H fixed at 2, 4. The decay factor λ for RWK {10^-6, 10^-5..., 10^-2}, for the GK kernel we set the graphlet size κ = 3 and cross validate the precision level ϵ and the confidence δ as in the original paper (Shervashidze et al., 2009). The tmax parameter for PROPAK is chosen within {1, 3, 5, 8, 10, 15, 20}. For PSCN, we choose the normalized betweenness centrality as labeling procedure and cross validate the batch size in {10, 15, ..., 35} and number of epochs in {10, 20, ..., 100}. Finally for FGW, γ is cross validated within {2^-10, 2^-9, ..., 2^10} and α is cross validated via a logspace search in [0, 0.5] and symmetrically [0.5, 1] (15 values are drawn). |