SAT-based Decision Tree Learning for Large Data Sets
Authors: Andre Schidler, Stefan Szeider3904-3912
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our new approach experimentally on a range of realworld instances that contain up to several thousand samples. In almost all cases, our method successfully decreases the depth of the initial decision tree; often, the decrease is significant. |
| Researcher Affiliation | Academia | Andr e Schidler and Stefan Szeider Algorithms and Complexity Group, TU Wien, Vienna, Austria aschidler@ac.tuwien.ac.at, sz@ac.tuwien.ac.at |
| Pseudocode | Yes | The pseudo-code for DT-SLIM(H) is shown in Algorithm 1. |
| Open Source Code | Yes | Source code can be found at https://github.com/ASchidler/ decisiontree and results at https://doi.org/10.5281/zenodo.4571570. |
| Open Datasets | Yes | We take all classification instances from the UCI Machine Learning Repository that use discrete domains and contain more than 500 samples. [...] Additionally, we take the instances from Narodytska et al. (2018) [...] For all the other instances we use 5-fold stratified cross-validation (also known as rotation estimation (Breiman et al. 1984; Kohavi 1995)) and report the average. |
| Dataset Splits | Yes | For all the other instances we use 5-fold stratified cross-validation (also known as rotation estimation (Breiman et al. 1984; Kohavi 1995)) and report the average. |
| Hardware Specification | Yes | We use servers with two Intel Xeon E5-2640 v4 CPUs running at 2.40 GHz and using Ubuntu 18.04. |
| Software Dependencies | Yes | We use the SAT solver Glucose 4.14 and the well-established decision tree inducers ITI 3.15 and Weka 3.8.46. |
| Experiment Setup | Yes | From the results we derive three sets of parameters: (i) d60 = 12 and c60 ranges from 70 to 15, (ii) d300 = 15 and c300 ranges from 300 to 90, and (iii) d800 = 39 and c800 ranges from 500 to 105. |