Learning Optimal Decision Trees with MaxSAT and its Integration in AdaBoost
Authors: Hao Hu, Mohamed Siala, Emmanuel Hebrard, Marie-José Huguet
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform three experiments on datasets from CP4IM1. The first experiment aims at highlighting the overfitting behaviour of the SAT approach. In the second experiment we compare the performance with the state of the art heuristic (CART) and exact (DL8.5) methods. Finally, we evaluate the Max SAT implementation of Boosting and Bagging. The datasets are binarized with the one-hot-encoding. |
| Researcher Affiliation | Academia | LAAS-CNRS, Universit e de Toulouse, CNRS, INSA, Toulouse, France1 |
| Pseudocode | No | The paper describes methods and formulas using mathematical equations and textual explanations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code source and datasets are available online at https://gepgitlab.laas.fr/hhu/maxsat-decision-trees. |
| Open Datasets | Yes | We perform three experiments on datasets from CP4IM1. The datasets are binarized with the one-hot-encoding. 1https://dtai.cs.kuleuven.be/CP4IM/datasets/ |
| Dataset Splits | Yes | We use the hold-out method to split training set and testing set for all datasets used. Following [Narodytska et al., 2018], we choose (only in this experiment) three (small) different splitting ratios r = {0.05, 0.1, 0.2} to generate training sets. And remaining examples are used for testing. This process is repeated 10 times using different randomisation seeds. We use stratified sampling to preserve the class distribution with 5-fold crossvalidation. This process is repeated with 10 different randomisation seeds for each dataset. We use the hold-out method with ratio r = 0.8 to split each dataset into training and testing set, which is repeated with 10 different randomisation seeds. |
| Hardware Specification | Yes | We ran all experiments using Xeon E5-2695 v3 @ 2.30GHz CPU and running x Ubuntu 16.04.6 LTS. |
| Software Dependencies | Yes | We use the RC2 Max SAT solver [Ignatiev et al., 2019]. We use the Max SAT solver Loandra [Berg et al., 2019; Demirovi c and Stuckey, 2019]... DL8.5 [Aglin et al., 2020] as a state-of-the-art exact approach via its python package (version 0.0.9). |
| Experiment Setup | Yes | Their parameters are kept to their default values, except for the maximum depth which is a fixed parameter for all the models. We choose three different maximum depths d {2, 3, 4} for CART and DL8.5. The timeout for training is set to 15 minutes and the memory limit to 16GB for all the experiments in this section. The core guided phase in Loandra is limited to 10 minutes. The number of learners (for boosting and bagging) is set yo be 21 as reasonable. |