Born-Again Tree Ensembles

Authors: Thibaut Vidal, Maximilian Schiffer

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present numerical studies which allow to analyze the characteristics of the born-again trees in terms of interpretability and accuracy. Further, these studies show that our algorithm is amenable to a wide range of real-world data sets.
Researcher Affiliation Academia 1Department of Computer Science, Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Rio de Janeiro, Brazil. 2TUM School of Management, Technical University of Munich, Munich, Germany.
Pseudocode Yes Algorithm 1 BORN-AGAIN(z L, z R)
Open Source Code Yes Detailed computational results, data, and source codes are available in the supplementary material and at the following address: https://github.com/vidalt/BA-Trees.
Open Datasets Yes We focus on a set of six datasets from the UCI machine learning repository [UCI] and from previous work by Smith et al. (1988) [Smith Et Al] and Hu et al. (2019) [Hu Et Al]
Dataset Splits Yes To obtain discrete numerical features, we used one-hot encoding on categorical data and binned continuous features into ten ordinal scales. Then, we generated training and test samples for all data sets using a ten-fold cross validation.
Hardware Specification Yes All our experiments were run on a single thread of an Intel(R) Xeon(R) CPU E5-2620v4 2.10GHz, with 128GB of available RAM, running Cent OS v7.7.
Software Dependencies Yes The DP algorithm was implemented in C++ and compiled with GCC 9.2.0 using flag -O3, whereas the original random forests were generated in Python (using scikit-learn v0.22.1).
Experiment Setup Yes Finally, for each fold and each dataset, we generated a random forest composed of ten trees with a maximum depth of three (i.e., eight leaves at most), considering p/2 random candidate features at each split.