Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Unsupervised Training of Matching-based Graph Edit Distance Solver via Preference-aware GAN

Authors: Wei Huang, Hanchen Wang, Dong Wen, SHAOZHEN MA, Wenjie Zhang, Xuemin Lin

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on benchmark datasets demonstrate that our GEDRanker enables the matching-based GED solver to achieve near-optimal solution quality without any ground-truth supervision. The source code is available at https://github.com/piupiupiuu/GEDRanker. 5 Experiments 5.1 Experimental Settings Datasets. We conduct experiments on three widely used real world datasets: AIDS700 [17], Linux [33, 17], and IMDB [17, 34]. Each dataset is split into 60%, 20% and 20% as training graphs, validation graphs and testing graphs, respectively. We construct training, validation, and testing graph pairs, and generate their corresponding ground-truth GEDs and node matching matrices for evaluation following the strategy described in [11]. More details of the datasets can be found in Appendix B.1. Baselines. We categorize baseline methods and our GEDRanker into three groups: (1) Traditional approximation methods: Hungarian [15], VJ [16] and GEDGW [12]; (2) Supervised hybrid methods: Noah [24], GENN-A* [25], MATA* [26], GEDGNN [11], GEDIOT [12] and Diff GED [13]; (3) Unsupervised hybrid method: GEDRanker. Due to the space limitations, the implementation details of our GEDRanker could be found in Appendix B.2. Table 1: Overall performance on testing graph pairs. Methods with a running time exceeding 24 hours are marked with -. : higher is better. : lower is better. Bold: best in its own group. Trad, SL and UL denotes Traditional, Supervised Learning and Unsupervised Learning, respectively. Results for baselines, except for GEDIOT and GEDGW, are taken from [13].
Researcher Affiliation	Academia	Wei Huang University of New South Wales EMAIL Hanchen Wang University of Technology Sydney EMAIL Dong Wen University of New South Wales EMAIL Shaozhen Ma University of New South Wales EMAIL Wenjie Zhang University of New South Wales EMAIL Xuemin Lin Shanghai Jiaotong University EMAIL
Pseudocode	Yes	Algorithm 1 Gumbel-Sinkhorn Input: gϕ(G1, G2, πt, t), number of iterations K, temperature τ; 1: Sample Gumbel noise Z R\|V1\| \|V2\| with Z[v][u] Gumbel(0, 1); 2: ˆπgϕ (gϕ(G1, G2, πt, t) + Z)/τ; 3: for i = 1 to K do 4: ˆπgϕ[v][u] ˆπgϕ[v][u] log P u V2 exp(ˆπgϕ[v][u ]); (row-wise normalization) 5: ˆπgϕ[v][u] ˆπgϕ[v][u] log P v V1 exp(ˆπgϕ[v ][u]); (column-wise normalization) 6: end for 7: ˆπgϕ exp(ˆπgϕ); 8: return ˆπgϕ; Algorithm 2 Reverse Process Input: A pair of graphs (G1, G2); 1: πt S Bernoulli(0.5)\|V1\| \|V2\|; 2: for i = S to 1 do 3: pϕ(eπ0\|πti, G1, G2) σ(gϕ(G1, G2, πti, ti)); 4: if i = 1 then 5: πti 1 pϕ(πti 1\|πti, G1, G2); 6: else 7: π0 Greedy(pϕ(πti 1\|πti, G1, G2)); 8: end if 9: end for 10: return π0; Algorithm 3 GEDRanker Training Strategy 1: Initialize π Greedy(ˆπ), πlast Greedy(ˆπ), with ˆπ U(0, 1)\|V1\| \|V2\|; 2: for each epoch do 3: Sample t U({1, ..., T}) and πt q(πt\| π); 4: ˆπgϕ S(gϕ(G1, G2, πt, t)); 5: πgϕ Greedy(ˆπgϕ); 6: Compute Dθ(G1, G2, ˆπgϕ), Dθ(G1, G2, π), Dθ(G1, G2, πlast); 7: Compute LDθ and update Dθ; 8: Compute Dθ(G1, G2, ˆπgϕ); 9: Compute Lgϕ and update gϕ; 10: if c(G1, G2, πgϕ) < c(G1, G2, π) then 11: Update π πgϕ; 12: end if 13: Update πlast πgϕ; 14: end for
Open Source Code	Yes	Extensive experiments on benchmark datasets demonstrate that our GEDRanker enables the matching-based GED solver to achieve near-optimal solution quality without any ground-truth supervision. The source code is available at https://github.com/piupiupiuu/GEDRanker.
Open Datasets	Yes	We conduct experiments on three widely used real world datasets: AIDS700 [17], Linux [33, 17], and IMDB [17, 34]. Each dataset is split into 60%, 20% and 20% as training graphs, validation graphs and testing graphs, respectively. We construct training, validation, and testing graph pairs, and generate their corresponding ground-truth GEDs and node matching matrices for evaluation following the strategy described in [11]. More details of the datasets can be found in Appendix B.1.
Dataset Splits	Yes	Each dataset is split into 60%, 20% and 20% as training graphs, validation graphs and testing graphs, respectively. We construct training, validation, and testing graph pairs, and generate their corresponding ground-truth GEDs and node matching matrices for evaluation following the strategy described in [11]. More details of the datasets can be found in Appendix B.1.
Hardware Specification	Yes	Testbed. All experiments are performed using an NVIDIA GeForce RTX 3090 24GB and an Intel Core i9-12900K CPU with 128GB RAM.
Software Dependencies	No	The paper does not specify particular software packages with version numbers. It mentions using an optimizer (RMSProp) and network architectures (GIN, AGNN), but not the specific software implementations (e.g., PyTorch, Python versions) with their respective version numbers.
Experiment Setup	Yes	Training. During training, the number of time steps T for the forward process is set to 1000 with a linear noise schedule, where β0 = 10^-4 and βT = 0.02. For the Gumbel-Sinkhorn operator, the number of iterations K is set to 5 and the temperature τ is set to 1. Moreover, we train gϕ and Dθ for 200 epochs with a batch size of 128. The loss weight λ for Lexplore in Lgϕ (Equation 6) is linearly annealed from from 1 to 0 during the first 100 epochs, and fixed at 0 for the remaining 100 epochs. For the optimizer, We adopt RMSProp with a learning rate of 0.001 and a weight decay of 5 * 10^-4. Inference. During inference, the number of time steps S for the reverse process is set to 10 with a linear denoising schedule. For each test graph pair, we generate k = 100 candidate node matching matrices in parallel.