Harnessing the power of choices in decision tree learning

Authors: Guy Blanc, Jane Lange, Chirag Pabbaraju, Colin Sullivan, Li-Yang Tan, Mo Tiwari

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate, theoretically and empirically, the power of this simple generalization. We first prove a greediness hierarchy theorem showing that for every k N, Top-(k + 1) can be dramatically more powerful than Top-k: there are data distributions for which the former achieves accuracy 1 ε, whereas the latter only achieves accuracy 1 2 + ε. We then show, through extensive experiments, that Top-k outperforms the two main approaches to decision tree learning: classic greedy algorithms and more recent optimal decision tree algorithms.
Researcher Affiliation Academia Stanford gblanc@stanford.edu MIT jlange@mit.edu Chirag Pabbaraju Stanford cpabbara@stanford.edu Colin Sullivan Stanford colins26@stanford.edu Li-Yang Tan Stanford lytan@stanford.edu Stanford motiwari@stanford.edu
Pseudocode Yes Pseudocode for Top-k is provided in Figure 1.
Open Source Code Yes The code to reproduce our results is available at: https://github.com/Sullivan C19/pydl8.5-topk.
Open Datasets Yes We run experiments on a variety of datasets from the UCI Machine Learning Repository [DG17] (numerical as well as categorical features) having a size in the thousands and having 50 300 features after binarization.
Dataset Splits Yes All plots are averaged over 10 random train-test splits (except avila and ml-prove that have pre-specified splits) with confidence intervals plotted for 2 standard deviations.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using "scikit-learn" and "Py DL8.5" but does not specify the version numbers for these software dependencies as used in their experiments.
Experiment Setup Yes We build decision trees corresponding to binary entropy as the impurity measure H. In order to leverage existing engineering optimizations from state-of-the-art optimal decision tree implementations, we implement the Top-k algorithm given in Figure 1 via simple modifications to the Py DL8.5 [ANS20, ANS21] codebase3. ... We do this for depth = 4, 5, 6 (for GOSDT, the regularization coefficient λ is set to 2 depth). ... All plots are averaged over 10 random train-test splits (except avila and ml-prove that have pre-specified splits) with confidence intervals plotted for 2 standard deviations. ... For 3 datasets car, hayes-roth and tic-tac-toe we plot train and test error as a function of k. All runs averaged over 10 random train-test splits with maximum depth fixed to 3.