reproducibilityindex.ai

ClavaDDPM: Multi-relational Data Synthesis with Cluster-guided Diffusion Models

Authors: Wei Pang, Masoumeh Shafieinejad, Lucy Liu, Stephanie Hazlewood, Xi He

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluations on multi-table datasets of varying sizes show that Clava DDPM significantly outperforms existing methods for these long-range dependencies while remaining competitive on utility metrics for single-table data.
Researcher Affiliation	Collaboration	Wei Pang1,2, Masoumeh Shafieinejad2, Lucy Liu3, Stephanie Hazlewood3, and Xi He 1,2 1University of Waterloo 2Vector Institute 3Royal Bank of Canada
Pseudocode	Yes	Algorithm 1 Clava DDPM: Latent learning and table augmentation. Algorithm 2 Clava DDPM: Training Algorithm 3 Clava DDPM: Synthesis
Open Source Code	Yes	We upload supplementary materials including code for reproducibility.
Open Datasets	Yes	We experiment with five real-world multi-relational datasets including California [6], Instacart 05 [23], Berka [4], Movie Lens [39, 32], and CCS [32].
Dataset Splits	No	The paper mentions a train-test split for MLE evaluation but does not specify a separate validation split for model training/tuning across all experiments.
Hardware Specification	Yes	All experiments are conducted with an NVIDIA A6000 GPU and 32 CPU cores, with a time limit of 7 days.
Software Dependencies	No	The paper mentions software like MLP, Adam W optimizer, CTGAN, Tab DDPM, and SDV library, but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	We perform a comprehensive ablation study using Berka (for it has the most complex multi-table structure) on each component of Clava DDPM and provide empirical tuning suggestions. The full results are in Table 2.