reproducibilityindex.ai

Feature Learning for Interpretable, Performant Decision Trees

Authors: Jack Good, Torin Kovach, Kyle Miller, Artur Dubrawski

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section contains evaluation and demonstration of interpretable models. Comprehensive results, as well as additional experiment details, are in the supplementary material. Unless otherwise noted, all results are from crisp trees. 4.1 Benchmarks We compare various configurations of our algorithm against popular tree-based baselines including decision trees, random forests, and Extra Trees. We report 10-fold cross validation accuracy and average number of splits in the model.
Researcher Affiliation	Academia	Jack H. Good, Torin Kovach, Kyle Miller, Artur Dubrawski Carnegie Mellon University {jhgood,tkovach,mille856,awd}@andrew.cmu.edu
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks. Methods are described textually and through mathematical formulations.
Open Source Code	No	The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	The data sets are selected from among the the most viewed tabular classification data sets on the UCI machine learning repository [14] at the time of writing. ... Results for MNIST trees. ... Table 1: Results of tabular data benchmarks. Number of attributes p is listed before and after one-hot encoding categorical attributes. ... iris [18], heart-disease [30], dry-bean [31], wine [1], car [5], wdbc [44], sonar [38], pendigits [2], ionosphere [39]
Dataset Splits	Yes	We report 10-fold cross validation accuracy and average number of splits in the model. ... Our models and the conventional decision trees have cost-complexity pruning α selected by cross-validation.
Hardware Specification	No	No specific hardware details (e.g., GPU models, CPU types, memory) used for running experiments are provided in the paper.
Software Dependencies	No	The paper mentions various algorithms and tools (e.g., CART, Random Forests, XGBoost, scikit-learn) but does not provide specific version numbers for any software dependencies needed to replicate the experiment.
Experiment Setup	Yes	Categorical attributes are one-hot encoded, and the data is normalized to mean 0 and standard deviation 1. For our models, we show in the main paper results for linear features and distance-to-prototype features with diagonal inverse covariance. Each is regularized with L1 coefficient λ1 = .01 to promote sparsity. Our models and the conventional decision trees have cost-complexity pruning α selected by cross-validation. Other hyperparameters are fixed and described in the supplementary material.