SAT-based Decision Tree Learning for Large Data Sets

Authors: Andre Schidler, Stefan Szeider3904-3912

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our new approach experimentally on a range of realworld instances that contain up to several thousand samples. In almost all cases, our method successfully decreases the depth of the initial decision tree; often, the decrease is significant.
Researcher Affiliation Academia Andr e Schidler and Stefan Szeider Algorithms and Complexity Group, TU Wien, Vienna, Austria aschidler@ac.tuwien.ac.at, sz@ac.tuwien.ac.at
Pseudocode Yes The pseudo-code for DT-SLIM(H) is shown in Algorithm 1.
Open Source Code Yes Source code can be found at https://github.com/ASchidler/ decisiontree and results at https://doi.org/10.5281/zenodo.4571570.
Open Datasets Yes We take all classification instances from the UCI Machine Learning Repository that use discrete domains and contain more than 500 samples. [...] Additionally, we take the instances from Narodytska et al. (2018) [...] For all the other instances we use 5-fold stratified cross-validation (also known as rotation estimation (Breiman et al. 1984; Kohavi 1995)) and report the average.
Dataset Splits Yes For all the other instances we use 5-fold stratified cross-validation (also known as rotation estimation (Breiman et al. 1984; Kohavi 1995)) and report the average.
Hardware Specification Yes We use servers with two Intel Xeon E5-2640 v4 CPUs running at 2.40 GHz and using Ubuntu 18.04.
Software Dependencies Yes We use the SAT solver Glucose 4.14 and the well-established decision tree inducers ITI 3.15 and Weka 3.8.46.
Experiment Setup Yes From the results we derive three sets of parameters: (i) d60 = 12 and c60 ranges from 70 to 15, (ii) d300 = 15 and c300 ranges from 300 to 90, and (iii) d800 = 39 and c800 ranges from 500 to 105.