Scalable Optimal Multiway-Split Decision Trees with Constraints
Authors: Shivaram Subramanian, Wei Sun
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate its efficacy with extensive experiments. We present results on datasets containing up to 1,008,372 samples while existing MIP-based decision tree models do not scale well on data beyond a few thousand points. |
| Researcher Affiliation | Industry | IBM Research, Yorktown Heights, New York, USA {subshiva, sunw}@us.ibm.com |
| Pseudocode | No | The paper describes the column generation procedure and K-shortest path heuristic verbally but does not include a formal pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any statement or link indicating that its source code is publicly available. |
| Open Datasets | Yes | We evaluate the same 12 classification datasets from the UCI repository (Dua and Graff 2017)... UCI Machine Learning Repository. http://archive.ics.uci.edu/ml. |
| Dataset Splits | Yes | We create 5 random splits for each dataset into training (50%), validation (25%), and test sets (25%). |
| Hardware Specification | Yes | All experiments were run on an Intel 8-core i7 PC with 32GB RAM. |
| Software Dependencies | Yes | CPLEX 20.1 (CPLEX 2021) was used to solve the MIP-based methods. |
| Experiment Setup | Yes | The minimum number of samples per rule for OMT was set to 1% of the training data. The maximum value for K was set to 1000 in the KSP for all instances except the large dataset experiments, where it was reduced to 100 to stay within the RAM limit. We set a maximum CG iteration limit of 40 and a ˆL limit of 10,000. All experiments were run on an Intel 8-core i7 PC with 32GB RAM. |