Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
STaSy: Score-based Tabular data Synthesis
Authors: Jayoung Kim, Chaejeong Lee, Noseong Park
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Furthermore, we also conduct rigorous experimental studies in terms of the generative task trilemma: sampling quality, diversity, and time. In our experiments with 15 benchmark tabular datasets and 7 baselines, our method outperforms existing methods in terms of task-dependant evaluations and diversity. |
| Researcher Affiliation | Academia | Jayoung Kim, Chaejeong Lee, and Noseong Park Department of Artificial Intelligence Yonsei University Seoul, South Korea EMAIL |
| Pseudocode | Yes | Algorithm 1 shows the overall training process for our STa Sy. |
| Open Source Code | Yes | Source codes used in the experiments are available in the supplementary material. By following the README guidance, the main results are easily reproducible. |
| Open Datasets | Yes | The raw data of 15 datasets are available online: Credit: https://www.kaggle.com/mlg-ulb/creditcardfraud (Db CL 1.0) ... Spambase: https://archive.ics.uci.edu/ml/datasets/spambase (CC BY 4.0) |
| Dataset Splits | Yes | The train-test split ratio is 80% and 20%, respectively. |
| Hardware Specification | Yes | Our software and hardware environments are as follows: UBUNTU 18.04 LTS, PYTHON 3.8.2, PYTORCH 1.8.1, CUDA 11.4, and NVIDIA Driver 470.42.01, i9 CPU, and NVIDIA RTX 3090. |
| Software Dependencies | Yes | Our software and hardware environments are as follows: UBUNTU 18.04 LTS, PYTHON 3.8.2, PYTORCH 1.8.1, CUDA 11.4, and NVIDIA Driver 470.42.01, i9 CPU, and NVIDIA RTX 3090. |
| Experiment Setup | Yes | Hyperparameter settings for the best models are in Table 27. We have three SDE types, which are VE, VP, and sub-VP, and three layer types as shown in Appendix C: Concat, Squash, and Concatsquash. We use a learning rate in {2e 03, 2e 04}. We search for α0 and β0, in total, with 9 combinations using α0 = {0.20, 0.25, 0.30} and β0 = {0.80, 0.90, 0.95}. |