GOGGLE: Generative Modelling for Tabular Data by Learning Relational Structure

Authors: Tennison Liu, Zhaozhi Qian, Jeroen Berrevoets, Mihaela van der Schaar

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using real-world datasets, we provide empirical evidence that the proposed method is effective in generating realistic synthetic data and exploiting domain knowledge for downstream tasks. and We employ both qualitative and quantitative approaches to demonstrate that GOGGLE achieves consistent improvements over state-of-the-art benchmarks in generating synthetic data and exploiting prior knowledge for better downstream performance.
Researcher Affiliation Academia Tennison Liu University of Cambridge tl522@cam.ac.uk Zhaozhi Qian University of Cambridge zq224@cam.ac.uk Jeroen Berrevoets University of Cambridge jb2384@cam.ac.uk Mihaela van der Schaar University of Cambridge Alan Turing Institute mv472@cam.ac.uk
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is provided on Git Hub.4 https://github.com/tennisonliu/GOGGLE; https://github.com/vanderschaarlab/GOGGLE
Open Datasets Yes We employ 8 real-world datasets from the UCI repository [16], and 2 datasets from the BN repository [39]. ... We use 10 datasets in total, including 8 UCI datasets [16], specifically Adult, Breast, Covertype, Credit, White, Red, Mice, Musk and 2 Bayesian Network repository datasets [39], specifically ECOLI and MAGIC-IRRI. ... https://www.bnlearn.com/bnrepository/
Dataset Splits Yes The data is split 60-20-20 into train, validation and test sets and reported results are averaged over 10 runs.
Hardware Specification Yes All experiments are run on an NVIDIA Tesla K40C GPU, taking less than an hour to complete.
Software Dependencies No The paper mentions that models are implemented in 'Py Torch [48]' but does not provide specific version numbers for PyTorch or any other software dependencies like Python or CUDA.
Experiment Setup Yes For all methods compared, we consider hyperparameters include batch size {64, 128}, learning rate {1e 3, 5e 3, 1e 2}. We include a weight decay of 1e 3 [58]. ... For the graph sparsity term, we consider regularization penalty λ {1e 3, 1e 2, 1e 1}. For the KL divergence penalty, we consider α {0.1, 0.5, 1.0}. All models are trained for a maximum of 1000 epochs, with early stopping if no improvements on the validation set for 50 epochs.