Debiasing Synthetic Data Generated by Deep Generative Models
Authors: Alexander Decruyenaere, Heidelinde Dehaene, Paloma Rabaey, Johan Decruyenaere, Christiaan Polet, Thomas Demeester, Stijn Vansteelandt
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We exemplify our proposal through a simulation study on toy data and two case studies on real-world data, highlighting the importance of tailoring DGMs for targeted data analysis. |
| Researcher Affiliation | Collaboration | Alexander Decruyenaere Ghent University Hospital SYNDARA Heidelinde Dehaene Ghent University Hospital SYNDARA Paloma Rabaey Ghent University imec Christiaan Polet Ghent University Hospital SYNDARA Johan Decruyenaere Ghent University Hospital SYNDARA Thomas Demeester Ghent University imec Stijn Vansteelandt Ghent University Department of Applied Mathematics, Computer Science and Statistics |
| Pseudocode | Yes | Algorithm 1: Data generating process for hypothetical disease. |
| Open Source Code | Yes | Our code is available on Github: https://github.com/syndara-lab/debiased-generation. |
| Open Datasets | Yes | International Stroke Trial (IST) dataset (Sandercock et al., 2011) |
| Dataset Splits | No | The paper describes generating synthetic data and evaluating its utility, but does not specify a distinct 'validation' dataset split of the original data in the conventional machine learning sense. While methods like k-fold cross-fitting are mentioned, they are not explicitly defined as a separate validation set split used for hyperparameter tuning in the main experimental setup. |
| Hardware Specification | Yes | All experiments were run on our institutional high performance computing cluster using a single GPU (NVIDIA Ampere A100; 80GB GPU memory) and single CPU (AMD EPYC 7413) |
| Software Dependencies | No | The paper mentions software packages like 'Synthcity' and 'SDV' and implicitly 'Python' but does not provide specific version numbers for these tools or any other software libraries required for reproducibility. |
| Experiment Setup | Yes | The DGMs were trained using the default hyperparameters as suggested by the package Synthcity (Qian et al., 2023). We also show results obtained for other hyperparameters (the default in the package SDV (Patki et al., 2016)) in Appendix A.7.4. A comparison of the default hyperparameters in both packages is provided in Tables A1 and A2. |