Blossom: an Anytime Algorithm for Computing Optimal Decision Trees

Authors: Emir Demirović, Emmanuel Hebrard, Louis Jean

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that whereas existing exact methods hardly scale to deep trees, this algorithm learns trees comparable to standard heuristics without computational overhead, and can significantly improve their accuracy when given more computation time, even for deep trees. 5. Experimental Results All experiments were run on 4 cluster nodes, each with 36 Intel Xeon CPU E5-2695 v4 2.10GHz cores running Linux Ubuntu 16.04.4.
Researcher Affiliation Collaboration Emir Demirovi c * 1 Emmanuel Hebrard * 2 Louis Jean * 3 1TU DELFT, The Netherlands 2LAAS-CNRS, Universit e de Toulouse, CNRS, France 3Jolibrain, Toulouse, France. Correspondence to: Emir Demirovi c <e.demirovic@tudelft.nl>, Emmanuel Hebrard <hebrard@laas.fr>, Louis Jean <louis.jean@jolibrain.com>.
Pseudocode Yes Algorithm 1: Dynamic Programming Algorithm Algorithm: DP Data: N, P, F, k Result: Min. error of k-bounded trees on data set N, P and Algorithm 2: Blossom Algorithm Algorithm: Blossom Data: N, P, k Result: Min. error of k-bounded trees on data set N, P
Open Source Code Yes The source code for Blossom is available at https://gitlab.laas.fr/ehebrard/blossom
Open Datasets Yes We used a collection of 58 data sets from related work (Aglin et al., 2020; Narodytska et al., 2018; Verwer & Zhang, 2019), plus 8 large extra data sets from Kaggle and UCI. The taiwan binarised and adult discretized data sets come from the UCI repository and were discretized using ad-hoc threshold on continuous or large numerical features, and one-hot encoding for symbolic features.
Dataset Splits Yes For each data set we generated 50 random samples of 20% of the datapoints to be used for validation, while the rest was used for training, and we report the averages.
Hardware Specification Yes All experiments were run on 4 cluster nodes, each with 36 Intel Xeon CPU E5-2695 v4 2.10GHz cores running Linux Ubuntu 16.04.4.
Software Dependencies Yes Sources were compiled using g++8.2
Experiment Setup Yes Every algorithm was run until completion or until reaching a time limit of one hour, and within a memory limit of 50GB. Given the large number of runs, we stopped Blossom after 10 minutes in each run, and kept the last tree it returned.