Generative Forests

Authors: Richard Nock, Mathieu Guillame-Bert

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the quality of generated data display substantial improvements compared to the state of the art.
Researcher Affiliation Industry Richard Nock Google Research richardnock@google.com Mathieu Guillame-Bert Google gbm@google.com
Pseudocode Yes Algorithm 1 INIT(tΥtu T t 1); Algorithm 2 STARUPDATE(Υ, C, R); Algorithm 3 GF.BOOST(R, J, T)
Open Source Code Yes Our code is provided and commented in Appendix, Section V.2.
Open Datasets Yes We carried out experiments on a total of 21 datasets, from UCI [10], Kaggle, Open ML, the Stanford Open Policing Project, or simulated. All are presented in Appendix, Section V.1.
Dataset Splits Yes The evaluation pipeline is simple: we create for each domain a 5-fold stratified experiment.
Hardware Specification Yes We ran part of the experiments on a Mac Book Pro 16 Gb RAM w/ 2 GHz Quad-Core Intel Core i5 processor, and part on a desktop Intel(R) Xeon(R) 3.70GHz with 12 cores and 64 Gb RAM.
Software Dependencies Yes MICE We have used the R MICE package V 3.13.0 with two choices of methods for the round robin (column-wise) prediction of missing values: CART [1] and random forests (RF) [42].
Experiment Setup Yes In Table 5, contenders are parameterized as follows: ARFs learn sets of 200 trees. CT-GANs are trained for 1 000 epochs. Forest Flows and VCAEs are run with otherwise default parameters. We optimized MICE by choosing as supervised models trees (CART) and random forests (RFs, we increased the number of trees to 100 for better results).