Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Joint Relational Database Generation via Graph-Conditional Diffusion Models

Authors: Mohamed Amine Ketata, David Lüdke, Leo Schwinn, Stephan Günnemann

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on six real-world RDBs demonstrate that our approach substantially outperforms autoregressive baselines in modeling multi-hop inter-table correlations and achieves state-of-the-art performance on single-table fidelity metrics. Our code is available at https://github.com/ketatam/rdb-diffusion.
Researcher Affiliation	Academia	Mohamed Amine Ketata, David Lüdke, Leo Schwinn, Stephan Günnemann School of Computation, Information and Technology & Munich Data Science Institute Technical University of Munich, Germany Correspondence to: EMAIL
Pseudocode	Yes	Algorithm 1 Training 1: repeat 2: Sample v ∼ Uniform(V) ... Algorithm 2 Sampling 1: X (T ) ← {x(T )v ∼ N(0, I) \| v ∈ V} 2: for t = T, . . . , 1 do
Open Source Code	Yes	Our code is available at https://github.com/ketatam/rdb-diffusion.
Open Datasets	Yes	We use six real-world relational databases in our experiments. Five were used in the evaluation setup in [15]: Berka [35], Instacart 05 [36], Movie Lens [37, 38], CCS [37], and California [39]. In addition, we use the Rel Bench-F1 database [40] from Rel Bench [41], a recently introduced benchmark for relational deep learning [17].
Dataset Splits	Yes	To assess the performance of GRDM in the task of missing value imputation, we design a new experiment on the California database, which consists of two tables: a parent table and a child table. First, we split the database into a training set and a holdout set based on the parent table, and use the training set to train the diffusion model in the same way as presented in the main text.
Hardware Specification	Yes	All experiments are run using a single NVIDIA A100-PCIE-40GB GPU.
Software Dependencies	Yes	For each of these metrics, we use the Kolmogorov-Smirnov (KS) statistic and the Total Variation (TV) distance to compare distributions of numerical and categorical values, respectively. To compare correlations of column pairs, we use the Pearson correlation coefficient for numerical values and the contingency table for categorical values. All metrics are normalized to lie between 0 (least fidelity) and 100 (highest fidelity). Detailed descriptions of these metric computations are in Appendix D.3. ... Evaluation Metrics. To evaluate the quality of the synthetic data in terms of fidelity, we follow [15] and report the following metrics implemented in the SDMetrics package [42]. ... URL https://docs.sdv.dev/ sdmetrics/. Version 0.18.0.
Experiment Setup	Yes	For the GNN, we use the heterogeneous version of the Graph SAGE model [29] with the number of layers set to the number of hops K from the diffusion model. In all our experiments, we set K = 1. We use sum-based aggregation and a hidden dimension of 256 for the GNN. ... We use layers of sizes 512, 1024, 1024, 1024, 1024, 512, if the corresponding table has more than 10, 000 rows, and use a smaller MLP with layers of sizes 512, 1024, 1024, 512 otherwise. ... We follow [15] and set the diffusion timesteps T = 2000 and use cosine scheduler for the noise schedule. ... We also follow [15] and use the Adam W optimizer with learning rate 6e-4 and weight decay 1e-5. We use 100, 000 training steps for California and 200, 000 on all other databases. We use a batch size of 4096 on all databases.