Generative Trees: Adversarial and Copycat

Authors: Richard Nock, Mathieu Guillame-Bert

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide experiments on tasks including fake/real distinction and missing data imputation.
Researcher Affiliation Industry 1Google Research. Correspondence to: Richard Nock <richardnock@google.com>.
Pseudocode Yes Algorithm 1 TD-GENp G, hq
Open Source Code No The paper does not provide an explicit link to its own open-source code. It mentions implementing in Java and using third-party code like CTGAN.
Open Datasets Yes The domains we used included simulated domains and domains from the UCI, Kaggle and the Stanford Open Policing project. The UCI (Dua & Graff, 2021)
Dataset Splits Yes We use a 5-folds CV experiment... For each domain, we carry out a 5-fold CV, leaving 20% of the data for testing and the rest for training.
Hardware Specification Yes We ran part of the experiments on a Mac Book Pro 16 Gb RAM w/ 2 GHz Quad-Core Intel(R) Core i5(R) processor, and part on a desktop Intel(R) Xeon(R) 3.70GHz with 12 cores and 64 Gb RAM.
Software Dependencies Yes We have used the R mice package V 3.13.0... We have used the Python implementation with default values... To learn the additional Random Forests and Gradient Boosted Decision Trees involved in experiments TRAIN-GEN, GEN-DISCRIM and GEN-AUG, we used Tensorflow Decision Forests library... our implementation is in Java.
Experiment Setup Yes We grow generative trees with three different sizes: very small (10 splits, i.e. 21 total nodes), medium (300 splits, i.e. 601 total nodes) and maximal... We compare our three GT training flavours to three training neural network based methods relying on the state of the art CTGAN (Xu et al., 2019). We use CTGAN code with default parameters and varying number of epochs, choosing small (10), medium (300) and large (1K) training epochs. for Random Forests, we use 300 trees with max depth 16. for Gradient Boosted Decision Trees, we use max 300 trees, with 10% of the training dataset for validation and early stopping. Max depth is 6