Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ClavaDDPM: Multi-relational Data Synthesis with Cluster-guided Diffusion Models

Authors: Wei Pang, Masoumeh Shafieinejad, Lucy Liu, Stephanie Hazlewood, Xi He

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations on multi-table datasets of varying sizes show that Clava DDPM significantly outperforms existing methods for these long-range dependencies while remaining competitive on utility metrics for single-table data.
Researcher Affiliation Collaboration Wei Pang1,2, Masoumeh Shafieinejad2, Lucy Liu3, Stephanie Hazlewood3, and Xi He 1,2 1University of Waterloo 2Vector Institute 3Royal Bank of Canada
Pseudocode Yes Algorithm 1 Clava DDPM: Latent learning and table augmentation. Algorithm 2 Clava DDPM: Training Algorithm 3 Clava DDPM: Synthesis
Open Source Code Yes We upload supplementary materials including code for reproducibility.
Open Datasets Yes We experiment with five real-world multi-relational datasets including California [6], Instacart 05 [23], Berka [4], Movie Lens [39, 32], and CCS [32].
Dataset Splits No The paper mentions a train-test split for MLE evaluation but does not specify a separate validation split for model training/tuning across all experiments.
Hardware Specification Yes All experiments are conducted with an NVIDIA A6000 GPU and 32 CPU cores, with a time limit of 7 days.
Software Dependencies No The paper mentions software like MLP, Adam W optimizer, CTGAN, Tab DDPM, and SDV library, but does not provide specific version numbers for these dependencies.
Experiment Setup Yes We perform a comprehensive ablation study using Berka (for it has the most complex multi-table structure) on each component of Clava DDPM and provide empirical tuning suggestions. The full results are in Table 2.