Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

GRAIL: Graph Edit Distance and Node Alignment using LLM-Generated Code

Authors: Samidha Verma, Arushi Goyal, Ananya Mathur, Ankit Anand, Sayan Ranu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on seven datasets confirm that GRAIL not only surpasses state-of-the-art GED approximation methods in prediction quality but also achieves robust cross-domain generalization across diverse graph distributions.
Researcher Affiliation	Collaboration	1Yardi School of Artificial Intelligence, IIT Delhi, India 2Department of Computer Science and Engineering, IIT Delhi, India 3Google Deep Mind, Montreal, Canada. Correspondence to: Samidha Verma < EMAIL>, Arushi Goyal <EMAIL>, Ananya Mathur <EMAIL>, Ankit Anand <EMAIL>, Sayan Ranu <EMAIL>.
Pseudocode	Yes	Algorithm 1 The greedy approach Require: Train data T = {T1, , Tn} where Tt = Gt, G t is a pair of graphs, budget b. Ensure: solution set Agreedy, \|Agreedy\| = b 1: Agreedy 2: while size(Agreedy) b (within budget) do 3: P arg max P D\Agreedy {J (Agreedy) J (Agreedy {P})} 4: Agreedy Agreedy {P } 5: Return Agreedy
Open Source Code	Yes	The codebase of GRAIL and the programs generated for the various datasets are available at https://github.com/idea-iitd/Grail.
Open Datasets	Yes	While AIDS, Linux and IMDB are obtained from Morris et al. (2020), the other four datasets are made available by Hu et al. (2021).
Dataset Splits	Yes	To construct the test set for a particular dataset, we select 1000 graph pairs uniformly at random and compute their true GED. ... Neural Algorithms: All neural approaches are trained on 10,000 graph pairs per dataset. ... GRAIL and GRAIL-MIX: GRAIL is trained with only 1,000 graph pairs per dataset.
Hardware Specification	Yes	All experiments ran on a machine equipped with an Intel Xeon Gold 6142 CPU @1GHz and a Ge Force GTX 1080 Ti GPU.
Software Dependencies	No	For the LLM, we use Gemini 1.5 Pro. In particular, we have used the initial stable version of Gemini 1.5 Pro, i.e., gemini-1.5-pro-001, which was released on May 24, 2024. No other software dependencies with specific version numbers are provided for the implementation itself.
Experiment Setup	Yes	Hyper-parameters: Table H lists the hyper-parameters used for GRAIL. k stands for the number of functions per response generated by the LLM and b is the function budget employed for submodularity while training. ... Table H. Hyper-parameters used for GRAIL: k 2, b 15, number of islands 5, temperature 0.99, Algorithm for bipartite matching Neighbor-biased mapper (He & Singh, 2006)