reproducibilityindex.ai

Learning Optimal Classification Trees Using a Binary Linear Program Formulation

Authors: Sicco Verwer, Yingqian Zhang1625-1632

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show experimentally that our formulation achieves better performance than existing formulations on both small and large problem instances within shorter running time. We test our method on benchmark datasets from the UCI machine learning repository (Lichman 2013). We compare the performance of Bin OCT with OCT (Bertsimas and Dunn 2017) and DTIPs (Verwer and Zhang 2017), two recently proposed ILP-based classiﬁcation algorithms. We use the accuracy in the cited papers for comparison.
Researcher Affiliation	Academia	Sicco Verwer Delft University of Technology The Netherlands s.e.verwer@tudelft.nl Yingqian Zhang Eindhoven University of Technology The Netherlands yqzhang@tue.nl
Pseudocode	Yes	Algorithm 1: Obtaining the binary encoding value ranges
Open Source Code	Yes	Our implemented models, the code, and the used training and testing data sets are available online at https://github.com/Sicco Verwer/binoct.
Open Datasets	Yes	We test our method on benchmark datasets from the UCI machine learning repository (Lichman 2013). The datasets used in the experiments are listed in Table 4.
Dataset Splits	No	For a given dataset, 50% of the dataset are used for training and 25% for testing. As we do not have hyperparameters to tune in our model, the remaining 25% are not used. The split is down randomly ﬁve times.
Hardware Specification	Yes	The resulting Bin OCT model is passed to the optimization solver CPLEX 12.8.0, running on an AMD Ryzen machine with 16GB RAM, which returns the best solution (i.e., a classiﬁcation tree with highest accuracy) it can ﬁnd within the given time limit.
Software Dependencies	Yes	The resulting Bin OCT model is passed to the optimization solver CPLEX 12.8.0, running on an AMD Ryzen machine with 16GB RAM... We also run CART from sciki-learn with its default parameter setting.
Experiment Setup	Yes	We provide CPLEX with a priority order such that variables closer to the root of the tree get solved ﬁrst. We also run CART from sciki-learn with its default parameter setting (i.e., criterion=gini, splitter=best, min samples split=2, min samples leaf=1, max leaf nodes=None), except that the maximum depths of the trees generated by CART are set to the same depths as Bin OCT. The time limit for Bin OCT is set to 10 minutes for each instance. In addition, as in (Bertsimas and Dunn 2017; Verwer and Zhang 2017), Bin OCT is solved by CPLEX with warm starts learned from CART to investigate whether this helps ﬁnd better solutions. We name it Bin OCT*.