Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SBGD: Improving Graph Diffusion Generative Model via Stochastic Block Diffusion

Authors: Junwei Su, Shan Wu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results show that SBGD achieves significant memory improvements (up to 6 ) while maintaining comparable or even superior graph generation performance relative to state-of-the-art methods. Furthermore, experiments demonstrate that SBGD better generalizes to unseen graph sizes. The significance of SBGD extends beyond being a scalable and effective GDGM; it also exemplifies the principle of modularization in generative modeling, offering a new avenue for exploring generative models by decomposing complex tasks into more manageable components.
Researcher Affiliation	Academia	1School of Computing and Data Science, University of Hong Kong 2School of Resources and Environmental Engineering, Hefei University of Technology. Correspondence to: Junwei Su <EMAIL>, Shan Wu <EMAIL >.
Pseudocode	Yes	Pseudo-code for training and sampling is provided in Appendix B, along with additional technical details of the implementation. ... Algorithm 1 SBGD Training Algorithm ... Algorithm 2 SBGD sampling algorithm.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the methodology, nor does it provide a link to a code repository. It only mentions that pseudocode is provided in the appendix.
Open Datasets	Yes	Datasets. We consider five real and synthetic datasets with varying sizes and connectivity levels: Planar-graphs, Contextual Stochastic Block Model(c SBM) (Deshpande et al., 2018), Proteins (Dobson & Doig, 2003), QM9 (Wu et al., 2018), OGBN-Arxiv, and OGBN-Products (Hu et al., 2021).
Dataset Splits	No	The paper mentions using 'test and generated graphs' for evaluation but does not specify the splitting methodology (e.g., percentages, counts, or predefined splits) for the datasets used in the experiments.
Hardware Specification	Yes	Testbed. Our experiments were conducted on a Dell Power Edge C4140, The key specifications of this server, pertinent to our research, include: CPU: Intel Xeon Gold 6230 processors equipped with 20 cores and 40 threads, GPU: NVIDIA Tesla V100 SXM2 units equipped with 32GB of memory, Memory: An aggregate of 256GB RAM, distributed across eight 32GB RDIMM modules, and Operating System: Ubuntu 18.04LTS
Software Dependencies	No	The paper mentions using a 'Graph Transformer' and the 'Adam optimizer', and refers to the 'METIS algorithm' from the 'DGL library', but no specific version numbers are provided for these software components.
Experiment Setup	Yes	For training our network, we adopt the widely-used Adam optimizer, tuning only the learning rate as the primary hyperparameter. To determine the optimal values for other hyperparameters in our model, we perform a simple grid search over the following ranges: Number of layers: [2, 4], Hidden dimension: [8, 16, 32, 64, 128, 256], Learning rate: [0.1, 0.05, 0.01, 0.005, 0.001], Diffusion Length T: [50,100,200], Sampling Steps: [100,200,500,1000]. For the variance schedule, we follow the one in (Jo et al., 2022).