Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distributional Learning of Variational AutoEncoder: Application to Synthetic Data Generation

Authors: Seunghwan An, Jong-June Jeon

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments and sections like 4.2 Evaluation Metrics and 4.3 Results with tables (e.g., Table 1: Averaged MLu metrics (MARE, F1).).
Researcher Affiliation Academia Seunghwan An and Jong-June Jeon Department of Statistical Data Science, University of Seoul, S. Korea EMAIL
Pseudocode Yes Algorithm 1 Discretization of Estimated CDF
Open Source Code Yes We release the code at https://github.com/an-seunghwan/DistVAE.
Open Datasets Yes For evaluation, we consider following real tabular datasets: covertype, credit, loan, adult, cabs, and kings (see Appendix A.8 for detailed data descriptions). ... covertype: https://www.kaggle.com/datasets/uciml/forest-cover-type-dataset
Dataset Splits No Table 8: Description of datasets. #C represents the number of continuous and ordinal variables. #D denotes the number of discrete variables. Dataset Train/Test Split... covertype 45k/5k (Only train/test splits are specified, not validation).
Hardware Specification Yes We run all experiments using Geforce RTX 3090 GPU
Software Dependencies No Our experimental codes are all available with pytorch. (PyTorch version is not specified, and no other software dependencies with versions are listed).
Experiment Setup Yes Table 9: Hyper-parameter settings for tabular dataset experiments. Model epochs batch size learning rate β (or decoder std range) d M ... Dist VAE 100 256 0.001 0.5 2 10