Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ClavaDDPM: Multi-relational Data Synthesis with Cluster-guided Diffusion Models
Authors: Wei Pang, Masoumeh Shafieinejad, Lucy Liu, Stephanie Hazlewood, Xi He
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations on multi-table datasets of varying sizes show that Clava DDPM significantly outperforms existing methods for these long-range dependencies while remaining competitive on utility metrics for single-table data. |
| Researcher Affiliation | Collaboration | Wei Pang1,2, Masoumeh Shafieinejad2, Lucy Liu3, Stephanie Hazlewood3, and Xi He 1,2 1University of Waterloo 2Vector Institute 3Royal Bank of Canada |
| Pseudocode | Yes | Algorithm 1 Clava DDPM: Latent learning and table augmentation. Algorithm 2 Clava DDPM: Training Algorithm 3 Clava DDPM: Synthesis |
| Open Source Code | Yes | We upload supplementary materials including code for reproducibility. |
| Open Datasets | Yes | We experiment with five real-world multi-relational datasets including California [6], Instacart 05 [23], Berka [4], Movie Lens [39, 32], and CCS [32]. |
| Dataset Splits | No | The paper mentions a train-test split for MLE evaluation but does not specify a separate validation split for model training/tuning across all experiments. |
| Hardware Specification | Yes | All experiments are conducted with an NVIDIA A6000 GPU and 32 CPU cores, with a time limit of 7 days. |
| Software Dependencies | No | The paper mentions software like MLP, Adam W optimizer, CTGAN, Tab DDPM, and SDV library, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | We perform a comprehensive ablation study using Berka (for it has the most complex multi-table structure) on each component of Clava DDPM and provide empirical tuning suggestions. The full results are in Table 2. |