Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Generative Forests
Authors: Richard Nock, Mathieu Guillame-Bert
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the quality of generated data display substantial improvements compared to the state of the art. |
| Researcher Affiliation | Industry | Richard Nock Google Research EMAIL Mathieu Guillame-Bert Google EMAIL |
| Pseudocode | Yes | Algorithm 1 INIT(tΥtu T t 1); Algorithm 2 STARUPDATE(Υ, C, R); Algorithm 3 GF.BOOST(R, J, T) |
| Open Source Code | Yes | Our code is provided and commented in Appendix, Section V.2. |
| Open Datasets | Yes | We carried out experiments on a total of 21 datasets, from UCI [10], Kaggle, Open ML, the Stanford Open Policing Project, or simulated. All are presented in Appendix, Section V.1. |
| Dataset Splits | Yes | The evaluation pipeline is simple: we create for each domain a 5-fold stratified experiment. |
| Hardware Specification | Yes | We ran part of the experiments on a Mac Book Pro 16 Gb RAM w/ 2 GHz Quad-Core Intel Core i5 processor, and part on a desktop Intel(R) Xeon(R) 3.70GHz with 12 cores and 64 Gb RAM. |
| Software Dependencies | Yes | MICE We have used the R MICE package V 3.13.0 with two choices of methods for the round robin (column-wise) prediction of missing values: CART [1] and random forests (RF) [42]. |
| Experiment Setup | Yes | In Table 5, contenders are parameterized as follows: ARFs learn sets of 200 trees. CT-GANs are trained for 1 000 epochs. Forest Flows and VCAEs are run with otherwise default parameters. We optimized MICE by choosing as supervised models trees (CART) and random forests (RFs, we increased the number of trees to 100 for better results). |