Generative Trees: Adversarial and Copycat
Authors: Richard Nock, Mathieu Guillame-Bert
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide experiments on tasks including fake/real distinction and missing data imputation. |
| Researcher Affiliation | Industry | 1Google Research. Correspondence to: Richard Nock <richardnock@google.com>. |
| Pseudocode | Yes | Algorithm 1 TD-GENp G, hq |
| Open Source Code | No | The paper does not provide an explicit link to its own open-source code. It mentions implementing in Java and using third-party code like CTGAN. |
| Open Datasets | Yes | The domains we used included simulated domains and domains from the UCI, Kaggle and the Stanford Open Policing project. The UCI (Dua & Graff, 2021) |
| Dataset Splits | Yes | We use a 5-folds CV experiment... For each domain, we carry out a 5-fold CV, leaving 20% of the data for testing and the rest for training. |
| Hardware Specification | Yes | We ran part of the experiments on a Mac Book Pro 16 Gb RAM w/ 2 GHz Quad-Core Intel(R) Core i5(R) processor, and part on a desktop Intel(R) Xeon(R) 3.70GHz with 12 cores and 64 Gb RAM. |
| Software Dependencies | Yes | We have used the R mice package V 3.13.0... We have used the Python implementation with default values... To learn the additional Random Forests and Gradient Boosted Decision Trees involved in experiments TRAIN-GEN, GEN-DISCRIM and GEN-AUG, we used Tensorflow Decision Forests library... our implementation is in Java. |
| Experiment Setup | Yes | We grow generative trees with three different sizes: very small (10 splits, i.e. 21 total nodes), medium (300 splits, i.e. 601 total nodes) and maximal... We compare our three GT training flavours to three training neural network based methods relying on the state of the art CTGAN (Xu et al., 2019). We use CTGAN code with default parameters and varying number of epochs, choosing small (10), medium (300) and large (1K) training epochs. for Random Forests, we use 300 trees with max depth 16. for Gradient Boosted Decision Trees, we use max 300 trees, with 10% of the training dataset for validation and early stopping. Max depth is 6 |