Learning Optimal Classification Trees Using a Binary Linear Program Formulation
Authors: Sicco Verwer, Yingqian Zhang1625-1632
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show experimentally that our formulation achieves better performance than existing formulations on both small and large problem instances within shorter running time. We test our method on benchmark datasets from the UCI machine learning repository (Lichman 2013). We compare the performance of Bin OCT with OCT (Bertsimas and Dunn 2017) and DTIPs (Verwer and Zhang 2017), two recently proposed ILP-based classification algorithms. We use the accuracy in the cited papers for comparison. |
| Researcher Affiliation | Academia | Sicco Verwer Delft University of Technology The Netherlands s.e.verwer@tudelft.nl Yingqian Zhang Eindhoven University of Technology The Netherlands yqzhang@tue.nl |
| Pseudocode | Yes | Algorithm 1: Obtaining the binary encoding value ranges |
| Open Source Code | Yes | Our implemented models, the code, and the used training and testing data sets are available online at https://github.com/Sicco Verwer/binoct. |
| Open Datasets | Yes | We test our method on benchmark datasets from the UCI machine learning repository (Lichman 2013). The datasets used in the experiments are listed in Table 4. |
| Dataset Splits | No | For a given dataset, 50% of the dataset are used for training and 25% for testing. As we do not have hyperparameters to tune in our model, the remaining 25% are not used. The split is down randomly five times. |
| Hardware Specification | Yes | The resulting Bin OCT model is passed to the optimization solver CPLEX 12.8.0, running on an AMD Ryzen machine with 16GB RAM, which returns the best solution (i.e., a classification tree with highest accuracy) it can find within the given time limit. |
| Software Dependencies | Yes | The resulting Bin OCT model is passed to the optimization solver CPLEX 12.8.0, running on an AMD Ryzen machine with 16GB RAM... We also run CART from sciki-learn with its default parameter setting. |
| Experiment Setup | Yes | We provide CPLEX with a priority order such that variables closer to the root of the tree get solved first. We also run CART from sciki-learn with its default parameter setting (i.e., criterion=gini, splitter=best, min samples split=2, min samples leaf=1, max leaf nodes=None), except that the maximum depths of the trees generated by CART are set to the same depths as Bin OCT. The time limit for Bin OCT is set to 10 minutes for each instance. In addition, as in (Bertsimas and Dunn 2017; Verwer and Zhang 2017), Bin OCT is solved by CPLEX with warm starts learned from CART to investigate whether this helps find better solutions. We name it Bin OCT*. |