Decision trees as partitioning machines to characterize their generalization properties
Authors: Jean-Samuel Leboeuf, Frédéric LeBlanc, Mario Marchand
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We benchmark our pruning algorithm on 19 datasets taken from the UCI Machine Learning Repository [Dua and Graff, 2017]. We compare our pruning algorithm to CART s cost-complexity algorithm... Table 1 presents the results of the four models we tested. ... Mean test accuracy and standard deviation on 25 random splits of 19 datasets... |
| Researcher Affiliation | Academia | Jean-Samuel Leboeuf Department of Computer Science and Software Engineering Université Laval, Québec, QC, Canada jean-samuel.leboeuf.1@ulaval.ca Frédéric Le Blanc Department of Mathematics and Statistics Université de Moncton, Moncton, NB, Canada efl7151@umoncton.ca Mario Marchand Department of Computer Science and Software Engineering Université Laval, Québec, QC, Canada mario.marchand@ift.ulaval.ca |
| Pseudocode | Yes | The formal version of the algorithm is presented in Algorithm 3 of Appendix E.1. |
| Open Source Code | Yes | The source code used in the experiments and to produce the tables is freely available at the address https://github.com/jsleb333/ paper-decision-trees-as-partitioning-machines. |
| Open Datasets | Yes | We benchmark our pruning algorithm on 19 datasets taken from the UCI Machine Learning Repository [Dua and Graff, 2017]. |
| Dataset Splits | Yes | As such, we chose to randomly split each dataset so that the models are trained on 75% of the examples and tested on the remaining 25%. To limit the effect of the randomness of the splits, we run each experiment 25 times and we report the mean test accuracy and the standard deviation. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory, or specific computing environments) used for running experiments were provided in the paper. It only states 'All experiments were done in pure Python.' |
| Software Dependencies | No | The paper mentions 'pure Python' and 'scikit-learn Python package' but does not provide specific version numbers for these or any other software dependencies, which are required for reproducibility. |
| Experiment Setup | Yes | The first model we consider is the greedily learned tree, grown using the Gini index until the tree has 100% classification accuracy on the training set or reaches 40 leaves. We impose this limit since the computation times for pruning trees become prohibitive for a large number of leaves. |