reproducibilityindex.ai

Scalable Optimal Multiway-Split Decision Trees with Constraints

Authors: Shivaram Subramanian, Wei Sun

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate its efficacy with extensive experiments. We present results on datasets containing up to 1,008,372 samples while existing MIP-based decision tree models do not scale well on data beyond a few thousand points.
Researcher Affiliation	Industry	IBM Research, Yorktown Heights, New York, USA {subshiva, sunw}@us.ibm.com
Pseudocode	No	The paper describes the column generation procedure and K-shortest path heuristic verbally but does not include a formal pseudocode or algorithm block.
Open Source Code	No	The paper does not provide any statement or link indicating that its source code is publicly available.
Open Datasets	Yes	We evaluate the same 12 classification datasets from the UCI repository (Dua and Graff 2017)... UCI Machine Learning Repository. http://archive.ics.uci.edu/ml.
Dataset Splits	Yes	We create 5 random splits for each dataset into training (50%), validation (25%), and test sets (25%).
Hardware Specification	Yes	All experiments were run on an Intel 8-core i7 PC with 32GB RAM.
Software Dependencies	Yes	CPLEX 20.1 (CPLEX 2021) was used to solve the MIP-based methods.
Experiment Setup	Yes	The minimum number of samples per rule for OMT was set to 1% of the training data. The maximum value for K was set to 1000 in the KSP for all instances except the large dataset experiments, where it was reduced to 100 to stay within the RAM limit. We set a maximum CG iteration limit of 40 and a ˆL limit of 10,000. All experiments were run on an Intel 8-core i7 PC with 32GB RAM.