Optimal Sparse Regression Trees

Authors: Rui Zhang, Rui Xin, Margo Seltzer, Cynthia Rudin

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We ran experiments on 12 datasets; the details are described in Appendix C.1. Our evaluation answers the following: 1. Are trees generated by existing regression tree optimization methods truly optimal? How well do optimal sparse regression trees generalize? How far from optimal are greedy-approach models? ( 6.1)
Researcher Affiliation Academia Rui Zhang1*, Rui Xin1*, Margo Seltzer2, Cynthia Rudin1 1 Duke University 2 University of British Columbia
Pseudocode Yes Algorithm 1: compute lower bound(dataset, sub, λ) lower bound // For a subproblem sub and regularization λ, compute its Equivalent k-Means Lower Bound
Open Source Code Yes Code Availability The implementation of OSRT is available at https://github. com/ruizhang1996/optimal-sparse-regression-tree-public.
Open Datasets Yes An example tree for the seoul bike dataset (VE and Cho 2020; Sathishkumar, Park, and Cho 2020; Dua and Graff 2017) constructed by our method is shown in Figure 1.
Dataset Splits Yes Optimization experiments in Appendix D and crossvalidation experiments in Appendix H, along with a demonstration of these results in Figure 2 show: (1) trees produced by other methods are usually sub-optimal even if they claim optimality (they do not prove optimality), and only our method can consistently find the optimal trees, which are the most efficient frontiers that optimize the trade-off between loss and sparsity
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper mentions software like IAI, Evtree, GUIDE, and CART, but it does not specify version numbers for these or any other ancillary software dependencies like programming languages or libraries.
Experiment Setup Yes Figure 1: Optimal regression tree for seoul bike dataset with λ = 0.05, max depth = 5.