ClavaDDPM: Multi-relational Data Synthesis with Cluster-guided Diffusion Models
Authors: Wei Pang, Masoumeh Shafieinejad, Lucy Liu, Stephanie Hazlewood, Xi He
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations on multi-table datasets of varying sizes show that Clava DDPM significantly outperforms existing methods for these long-range dependencies while remaining competitive on utility metrics for single-table data. |
| Researcher Affiliation | Collaboration | Wei Pang1,2, Masoumeh Shafieinejad2, Lucy Liu3, Stephanie Hazlewood3, and Xi He 1,2 1University of Waterloo 2Vector Institute 3Royal Bank of Canada |
| Pseudocode | Yes | Algorithm 1 Clava DDPM: Latent learning and table augmentation. Algorithm 2 Clava DDPM: Training Algorithm 3 Clava DDPM: Synthesis |
| Open Source Code | Yes | We upload supplementary materials including code for reproducibility. |
| Open Datasets | Yes | We experiment with five real-world multi-relational datasets including California [6], Instacart 05 [23], Berka [4], Movie Lens [39, 32], and CCS [32]. |
| Dataset Splits | No | The paper mentions a train-test split for MLE evaluation but does not specify a separate validation split for model training/tuning across all experiments. |
| Hardware Specification | Yes | All experiments are conducted with an NVIDIA A6000 GPU and 32 CPU cores, with a time limit of 7 days. |
| Software Dependencies | No | The paper mentions software like MLP, Adam W optimizer, CTGAN, Tab DDPM, and SDV library, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | We perform a comprehensive ablation study using Berka (for it has the most complex multi-table structure) on each component of Clava DDPM and provide empirical tuning suggestions. The full results are in Table 2. |