Harnessing the power of choices in decision tree learning
Authors: Guy Blanc, Jane Lange, Chirag Pabbaraju, Colin Sullivan, Li-Yang Tan, Mo Tiwari
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate, theoretically and empirically, the power of this simple generalization. We first prove a greediness hierarchy theorem showing that for every k N, Top-(k + 1) can be dramatically more powerful than Top-k: there are data distributions for which the former achieves accuracy 1 ε, whereas the latter only achieves accuracy 1 2 + ε. We then show, through extensive experiments, that Top-k outperforms the two main approaches to decision tree learning: classic greedy algorithms and more recent optimal decision tree algorithms. |
| Researcher Affiliation | Academia | Stanford gblanc@stanford.edu MIT jlange@mit.edu Chirag Pabbaraju Stanford cpabbara@stanford.edu Colin Sullivan Stanford colins26@stanford.edu Li-Yang Tan Stanford lytan@stanford.edu Stanford motiwari@stanford.edu |
| Pseudocode | Yes | Pseudocode for Top-k is provided in Figure 1. |
| Open Source Code | Yes | The code to reproduce our results is available at: https://github.com/Sullivan C19/pydl8.5-topk. |
| Open Datasets | Yes | We run experiments on a variety of datasets from the UCI Machine Learning Repository [DG17] (numerical as well as categorical features) having a size in the thousands and having 50 300 features after binarization. |
| Dataset Splits | Yes | All plots are averaged over 10 random train-test splits (except avila and ml-prove that have pre-specified splits) with confidence intervals plotted for 2 standard deviations. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using "scikit-learn" and "Py DL8.5" but does not specify the version numbers for these software dependencies as used in their experiments. |
| Experiment Setup | Yes | We build decision trees corresponding to binary entropy as the impurity measure H. In order to leverage existing engineering optimizations from state-of-the-art optimal decision tree implementations, we implement the Top-k algorithm given in Figure 1 via simple modifications to the Py DL8.5 [ANS20, ANS21] codebase3. ... We do this for depth = 4, 5, 6 (for GOSDT, the regularization coefficient λ is set to 2 depth). ... All plots are averaged over 10 random train-test splits (except avila and ml-prove that have pre-specified splits) with confidence intervals plotted for 2 standard deviations. ... For 3 datasets car, hayes-roth and tic-tac-toe we plot train and test error as a function of k. All runs averaged over 10 random train-test splits with maximum depth fixed to 3. |