Scalable Optimal Multiway-Split Decision Trees with Constraints

Authors: Shivaram Subramanian, Wei Sun

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate its efficacy with extensive experiments. We present results on datasets containing up to 1,008,372 samples while existing MIP-based decision tree models do not scale well on data beyond a few thousand points.
Researcher Affiliation Industry IBM Research, Yorktown Heights, New York, USA {subshiva, sunw}@us.ibm.com
Pseudocode No The paper describes the column generation procedure and K-shortest path heuristic verbally but does not include a formal pseudocode or algorithm block.
Open Source Code No The paper does not provide any statement or link indicating that its source code is publicly available.
Open Datasets Yes We evaluate the same 12 classification datasets from the UCI repository (Dua and Graff 2017)... UCI Machine Learning Repository. http://archive.ics.uci.edu/ml.
Dataset Splits Yes We create 5 random splits for each dataset into training (50%), validation (25%), and test sets (25%).
Hardware Specification Yes All experiments were run on an Intel 8-core i7 PC with 32GB RAM.
Software Dependencies Yes CPLEX 20.1 (CPLEX 2021) was used to solve the MIP-based methods.
Experiment Setup Yes The minimum number of samples per rule for OMT was set to 1% of the training data. The maximum value for K was set to 1000 in the KSP for all instances except the large dataset experiments, where it was reduced to 100 to stay within the RAM limit. We set a maximum CG iteration limit of 40 and a ˆL limit of 10,000. All experiments were run on an Intel 8-core i7 PC with 32GB RAM.