Blossom: an Anytime Algorithm for Computing Optimal Decision Trees
Authors: Emir Demirović, Emmanuel Hebrard, Louis Jean
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that whereas existing exact methods hardly scale to deep trees, this algorithm learns trees comparable to standard heuristics without computational overhead, and can significantly improve their accuracy when given more computation time, even for deep trees. 5. Experimental Results All experiments were run on 4 cluster nodes, each with 36 Intel Xeon CPU E5-2695 v4 2.10GHz cores running Linux Ubuntu 16.04.4. |
| Researcher Affiliation | Collaboration | Emir Demirovi c * 1 Emmanuel Hebrard * 2 Louis Jean * 3 1TU DELFT, The Netherlands 2LAAS-CNRS, Universit e de Toulouse, CNRS, France 3Jolibrain, Toulouse, France. Correspondence to: Emir Demirovi c <e.demirovic@tudelft.nl>, Emmanuel Hebrard <hebrard@laas.fr>, Louis Jean <louis.jean@jolibrain.com>. |
| Pseudocode | Yes | Algorithm 1: Dynamic Programming Algorithm Algorithm: DP Data: N, P, F, k Result: Min. error of k-bounded trees on data set N, P and Algorithm 2: Blossom Algorithm Algorithm: Blossom Data: N, P, k Result: Min. error of k-bounded trees on data set N, P |
| Open Source Code | Yes | The source code for Blossom is available at https://gitlab.laas.fr/ehebrard/blossom |
| Open Datasets | Yes | We used a collection of 58 data sets from related work (Aglin et al., 2020; Narodytska et al., 2018; Verwer & Zhang, 2019), plus 8 large extra data sets from Kaggle and UCI. The taiwan binarised and adult discretized data sets come from the UCI repository and were discretized using ad-hoc threshold on continuous or large numerical features, and one-hot encoding for symbolic features. |
| Dataset Splits | Yes | For each data set we generated 50 random samples of 20% of the datapoints to be used for validation, while the rest was used for training, and we report the averages. |
| Hardware Specification | Yes | All experiments were run on 4 cluster nodes, each with 36 Intel Xeon CPU E5-2695 v4 2.10GHz cores running Linux Ubuntu 16.04.4. |
| Software Dependencies | Yes | Sources were compiled using g++8.2 |
| Experiment Setup | Yes | Every algorithm was run until completion or until reaching a time limit of one hour, and within a memory limit of 50GB. Given the large number of runs, we stopped Blossom after 10 minutes in each run, and kept the last tree it returned. |