Generalized and Scalable Optimal Sparse Decision Trees

Authors: Jimmy Lin, Chudi Zhong, Diane Hu, Cynthia Rudin, Margo Seltzer

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental GOSDT’s novelty lies in its ability to optimize a large class of objective functions and its ability to efficiently handle continuous variables without sacrificing optimality. Thus, our evaluation results: 1) Demonstrate our ability to optimize over a large class of objectives (AUC in particular), 2) Show that GOSDT outperforms other approaches in producing models that are both accurate and sparse, and 3) Show how GOSDT scales in its handling of continuous variables relative to other methods.
Researcher Affiliation Academia 1University of British Columbia, 2 Duke University, Durham, North Carolina, USA.
Pseudocode Yes Algorithm 1 constructs and optimizes problems in the dependency graph such that, upon completion, we can extract the optimal tree by traversing the dependency graph by greedily choosing the split with the lowest objective value.
Open Source Code Yes An implementation of the algorithm is available at: https://github.com/Jimmy-Lin/Generalized Optimal Sparse Decision Trees.
Open Datasets Yes We use the Four Class dataset (Chang & Lin, 2011) to show optimal decision trees corresponding to different objectives.
Dataset Splits No Figure 4 and 5 show (1) that GOSDT typically produces excellent training and test accuracy with a reasonable number of leaves... No explicit validation set or split percentage is mentioned for reproducibility.
Hardware Specification No No specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) are provided for the experimental setup.
Software Dependencies No No specific software dependencies with version numbers are mentioned (e.g., Python, PyTorch, or specific library versions).
Experiment Setup Yes We present details of our experimental setup and datasets in Appendix I.