Hierarchical Shrinkage: Improving the accuracy and interpretability of tree-based models.
Authors: Abhineet Agarwal, Yan Shuo Tan, Omer Ronen, Chandan Singh, Bin Yu
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments over a wide variety of realworld datasets show that HS substantially increases the predictive performance of decision trees, even when used in conjunction with other regularization techniques. |
| Researcher Affiliation | Academia | 1Department of Statistics, UC Berkeley, Berkeley, California, USA 2EECS Department, UC Berkeley, Berkeley, California, USA. Correspondence to: Bin Yu <binyu@berkeley.edu>. |
| Pseudocode | No | The algorithm is described in detail using mathematical formulas (Equation 1 and 2) and text within Section 2, but there is no distinct block labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | All code and models are released in a fullfledged package available on Github. HS is integrated into the imodels package github.com/csinva/imodels (Singh et al., 2021) with an sklearncompatible API. |
| Open Datasets | Yes | In this section, we study the performance of HS on a collection of classification and regression datasets selected as follows. For classification, we consider a number of datasets used in the classic Random Forest paper (Breiman, 2001; Asuncion & Newman, 2007), one (Breast cancer with id=13) from the open ML repository, as well as two (Juvenile and Recidivism) that are commonly used to evaluate rule-based models (Wang, 2019). For regression, we consider all datasets used by Breiman (2001) with at least 200 samples, as well as a variety of data-sets from the PMLB benchmark (Romano et al., 2020) ranging from small to large sample sizes. |
| Dataset Splits | Yes | In all cases, 2/3 of the data is used for training (hyperparameters are selected via 3-fold CV on this set) and 1/3 of the data is used for testing. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions software packages like 'ranger', 'scikit-learn', 'bartpy', and 'imodels' but does not specify their version numbers, which are necessary for reproducible dependency descriptions. |
| Experiment Setup | Yes | For each tree, we compute its prediction performance before and after applying HS, where the regularization parameter for HS is selected from the set λ {0.1, 1.0, 10.0, 25.0, 50.0, 100.0} via cross-validation. Results for each experiment are averaged over 10 random data splits. In Fig S1, we simulate data via a linear model y = P10 i=1 xi + ϵ with x Unif[0, 1]50 and ϵ being drawn from a Gaussian or a Laplacian distribution for the left and right panel respectively, with noise variance σ2 = 0.01 in both cases. In both experiments, we used a training set of 500 samples to fit CART and hs CART models with a prescribed number of leaves, varying this number across a grid. For each hs CART model, the regularization parameter λ was chosen on the training set via 3-fold cross-validation. Finally, we repeat this entire process 100 times with resampled datasets. |