Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
SBGD: Improving Graph Diffusion Generative Model via Stochastic Block Diffusion
Authors: Junwei Su, Shan Wu
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that SBGD achieves significant memory improvements (up to 6 ) while maintaining comparable or even superior graph generation performance relative to state-of-the-art methods. Furthermore, experiments demonstrate that SBGD better generalizes to unseen graph sizes. The significance of SBGD extends beyond being a scalable and effective GDGM; it also exemplifies the principle of modularization in generative modeling, offering a new avenue for exploring generative models by decomposing complex tasks into more manageable components. |
| Researcher Affiliation | Academia | 1School of Computing and Data Science, University of Hong Kong 2School of Resources and Environmental Engineering, Hefei University of Technology. Correspondence to: Junwei Su <EMAIL>, Shan Wu <EMAIL >. |
| Pseudocode | Yes | Pseudo-code for training and sampling is provided in Appendix B, along with additional technical details of the implementation. ... Algorithm 1 SBGD Training Algorithm ... Algorithm 2 SBGD sampling algorithm. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the methodology, nor does it provide a link to a code repository. It only mentions that pseudocode is provided in the appendix. |
| Open Datasets | Yes | Datasets. We consider five real and synthetic datasets with varying sizes and connectivity levels: Planar-graphs, Contextual Stochastic Block Model(c SBM) (Deshpande et al., 2018), Proteins (Dobson & Doig, 2003), QM9 (Wu et al., 2018), OGBN-Arxiv, and OGBN-Products (Hu et al., 2021). |
| Dataset Splits | No | The paper mentions using 'test and generated graphs' for evaluation but does not specify the splitting methodology (e.g., percentages, counts, or predefined splits) for the datasets used in the experiments. |
| Hardware Specification | Yes | Testbed. Our experiments were conducted on a Dell Power Edge C4140, The key specifications of this server, pertinent to our research, include: CPU: Intel Xeon Gold 6230 processors equipped with 20 cores and 40 threads, GPU: NVIDIA Tesla V100 SXM2 units equipped with 32GB of memory, Memory: An aggregate of 256GB RAM, distributed across eight 32GB RDIMM modules, and Operating System: Ubuntu 18.04LTS |
| Software Dependencies | No | The paper mentions using a 'Graph Transformer' and the 'Adam optimizer', and refers to the 'METIS algorithm' from the 'DGL library', but no specific version numbers are provided for these software components. |
| Experiment Setup | Yes | For training our network, we adopt the widely-used Adam optimizer, tuning only the learning rate as the primary hyperparameter. To determine the optimal values for other hyperparameters in our model, we perform a simple grid search over the following ranges: Number of layers: [2, 4], Hidden dimension: [8, 16, 32, 64, 128, 256], Learning rate: [0.1, 0.05, 0.01, 0.005, 0.001], Diffusion Length T: [50,100,200], Sampling Steps: [100,200,500,1000]. For the variance schedule, we follow the one in (Jo et al., 2022). |