Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Position: Graph Matching Systems Deserve Better Benchmarks

Authors: Indradyumna Roy, Saswat Meher, Eeshaan Jain, Soumen Chakrabarti, Abir De

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Table 2, we quantify the impact of train-test leakage on baseline models using Intra-test-pairs and Cross-train-test-pairs. We evaluate these models on the default dataset test splits, which include leakage, and report the Mean Squared Error (MSE) and Kendall Tau Correlation (Ktau) between predicted and ground truth GED values under both Intra-test-pairs and Cross-train-test-pairs.
Researcher Affiliation	Academia	1IIT Bombay, Mumbai, India 2EPFL, Lausanne, Switzerland.
Pseudocode	Yes	Algorithm 1 Construct Edit Path from Permutation Matrix Algorithm 2 Dataset Processing with Cost Variants Algorithm 3 GENERATEPAIRS Algorithm 4 COMPUTEOPTIMALPATHS Algorithm 5 GENERATECOSTVARIANTS
Open Source Code	Yes	All code and datasets used in this work have been made publicly available at https://anonymous.4open.science/r/better-graph-matching-7146/.
Open Datasets	Yes	All code and datasets used in this work have been made publicly available at https://anonymous.4open.science/r/better-graph-matching-7146/. Using four leakage-free datasets (Mutag, Code2, Molhiv, Molpcba) from GRAPHEDX
Dataset Splits	Yes	This unique set is split into Strain, Sval, and Stest in a 60:20:20 ratio.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU models, or cloud instance types) used for running its experiments.
Software Dependencies	No	We explored two libraries, GEDLIB (Blumenthal et al., 2019) and Network X (Hagberg & Conway, 2020), for GED calculation using combinatorial approaches. These are library names with citations, but specific version numbers for these or any other software dependencies are not provided.
Experiment Setup	No	The paper does not explicitly provide details about specific hyperparameters (e.g., learning rate, batch size, number of epochs, optimizer settings) or system-level training settings for its experiments.