Generalized and Scalable Optimal Sparse Decision Trees
Authors: Jimmy Lin, Chudi Zhong, Diane Hu, Cynthia Rudin, Margo Seltzer
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | GOSDT’s novelty lies in its ability to optimize a large class of objective functions and its ability to efficiently handle continuous variables without sacrificing optimality. Thus, our evaluation results: 1) Demonstrate our ability to optimize over a large class of objectives (AUC in particular), 2) Show that GOSDT outperforms other approaches in producing models that are both accurate and sparse, and 3) Show how GOSDT scales in its handling of continuous variables relative to other methods. |
| Researcher Affiliation | Academia | 1University of British Columbia, 2 Duke University, Durham, North Carolina, USA. |
| Pseudocode | Yes | Algorithm 1 constructs and optimizes problems in the dependency graph such that, upon completion, we can extract the optimal tree by traversing the dependency graph by greedily choosing the split with the lowest objective value. |
| Open Source Code | Yes | An implementation of the algorithm is available at: https://github.com/Jimmy-Lin/Generalized Optimal Sparse Decision Trees. |
| Open Datasets | Yes | We use the Four Class dataset (Chang & Lin, 2011) to show optimal decision trees corresponding to different objectives. |
| Dataset Splits | No | Figure 4 and 5 show (1) that GOSDT typically produces excellent training and test accuracy with a reasonable number of leaves... No explicit validation set or split percentage is mentioned for reproducibility. |
| Hardware Specification | No | No specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) are provided for the experimental setup. |
| Software Dependencies | No | No specific software dependencies with version numbers are mentioned (e.g., Python, PyTorch, or specific library versions). |
| Experiment Setup | Yes | We present details of our experimental setup and datasets in Appendix I. |