Decision trees as partitioning machines to characterize their generalization properties

Authors: Jean-Samuel Leboeuf, Frédéric LeBlanc, Mario Marchand

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We benchmark our pruning algorithm on 19 datasets taken from the UCI Machine Learning Repository [Dua and Graff, 2017]. We compare our pruning algorithm to CART s cost-complexity algorithm... Table 1 presents the results of the four models we tested. ... Mean test accuracy and standard deviation on 25 random splits of 19 datasets...
Researcher Affiliation Academia Jean-Samuel Leboeuf Department of Computer Science and Software Engineering Université Laval, Québec, QC, Canada jean-samuel.leboeuf.1@ulaval.ca Frédéric Le Blanc Department of Mathematics and Statistics Université de Moncton, Moncton, NB, Canada efl7151@umoncton.ca Mario Marchand Department of Computer Science and Software Engineering Université Laval, Québec, QC, Canada mario.marchand@ift.ulaval.ca
Pseudocode Yes The formal version of the algorithm is presented in Algorithm 3 of Appendix E.1.
Open Source Code Yes The source code used in the experiments and to produce the tables is freely available at the address https://github.com/jsleb333/ paper-decision-trees-as-partitioning-machines.
Open Datasets Yes We benchmark our pruning algorithm on 19 datasets taken from the UCI Machine Learning Repository [Dua and Graff, 2017].
Dataset Splits Yes As such, we chose to randomly split each dataset so that the models are trained on 75% of the examples and tested on the remaining 25%. To limit the effect of the randomness of the splits, we run each experiment 25 times and we report the mean test accuracy and the standard deviation.
Hardware Specification No No specific hardware details (like GPU/CPU models, memory, or specific computing environments) used for running experiments were provided in the paper. It only states 'All experiments were done in pure Python.'
Software Dependencies No The paper mentions 'pure Python' and 'scikit-learn Python package' but does not provide specific version numbers for these or any other software dependencies, which are required for reproducibility.
Experiment Setup Yes The first model we consider is the greedily learned tree, grown using the Gini index until the tree has 100% classification accuracy on the training set or reaches 40 leaves. We impose this limit since the computation times for pruning trees become prohibitive for a large number of leaves.