A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees

Authors: Haoran Zhu, Pavankumar Murali, Dzung Phan, Lam Nguyen, Jayant Kalagnanam

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using 36 data-sets from the University of California Irvine Machine Learning Repository, we demonstrate that our formulation outperforms its counterparts in the literature by an average of about 10% in terms of mean out-of-sample testing accuracy across the data-sets.
Researcher Affiliation Industry Haoran Zhu, Pavankumar Murali, Dzung T. Phan, Lam M. Nguyen, Jayant R. Kalagnanam IBM Research, Thomas J. Watson Research Center, Yorktown Heights, NY 10598, USA haoran@ibm.com, pavanm@us.ibm.com, phandu@us.ibm.com, Lam Nguyen.MLTD@ibm.com, jayant@us.ibm.com
Pseudocode Yes Algorithm 1 LP-based data-selection in each cluster N
Open Source Code No The paper does not contain an explicit statement about releasing the source code or a link to a code repository for the methodology described.
Open Datasets Yes For benchmarking, we use data-sets from the UCI Machine Learning Repository [9].
Dataset Splits Yes We used the same training-testing split as in [4], which is 75% of the entire data-set as training set, and the rest 25% as testing set, with 5-fold cross-validation.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU/GPU models, memory, or cloud instance types).
Software Dependencies Yes We implemented all these MIP approaches in Python 3.6 and solved them using CPLEX 12.9.0 [8]. We invoked the Decision Tree Classifier implementation from scikit-learn to train a decision tree using CART, using default parameter settings.
Experiment Setup Yes The time limit was set to be 15 or 30 minutes for medium-sized data-sets, and for larger data-sets we increased it up to 4 hours to ensure that the solver could make sufficient progress. For all methods, the maximum tree depth was set to be the same as our SVM1-ODT. For solving S1O and OCT-H, we use the decision tree trained using CART as a warm start solution for CPLEX, as in [4, 24]. In Algorithm 1, T( λ) is the transformation we used in the proof of Theorem 4 to construct a feasible solution for (6), and β1, β2 are some pre-given threshold parameters.