Enumerating Distinct Decision Trees
Authors: Salvatore Ruggieri
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. ExperimentsTable 1 reports the number of instances and of features for small and large standard benchmarks datasets publicly available from (Lichman, 2013).Fig. 3 shows, for the IG split criterion, the distribution of distinct decision trees w.r.t. the size of attribute subset. |
| Researcher Affiliation | Academia | 1University of Pisa and ISTI-CNR, Pisa, Italy. |
| Pseudocode | Yes | Algorithm 1 subset(R, S) enumerates R 1 Pow(S), Algorithm 2 DTdistinct(R, S) enumerates distinct decision trees necessarily using R and possibly using S as split features |
| Open Source Code | Yes | The extended Ya DT version is publicly downloadable from: http://pages.di.unipi.it/ruggieri. |
| Open Datasets | Yes | Table 1 reports the number of instances and of features for small and large standard benchmarks datasets publicly available from (Lichman, 2013). |
| Dataset Splits | Yes | Following (Reunanen, 2003), we adopt 5-repeated stratified 10-fold cross validation in experimenting with wrapper models. For each holdout fold, feature selection is performed by splitting the 9fold training set into 70% building set and 30% search set using stratified random sampling. |
| Hardware Specification | Yes | Test were performed on a commodity PC with Intel 4 cores i52410@2.30 GHz, 16 Gb RAM, and Windows 10 OS. |
| Software Dependencies | No | The paper mentions implementation in C++ using the Ya DT system, but does not provide specific version numbers for the compiler, Ya DT system, or any other software dependencies. |
| Experiment Setup | Yes | Information Gain (IG) is used as quality measure in node splitting. No form of tree simplification (e.g., error-based pruning) is used. The m parameter is set to the small value 2 for all datasets. |