Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
SAT-based Decision Tree Learning for Large Data Sets
Authors: Andre Schidler, Stefan Szeider3904-3912
AAAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our new approach experimentally on a range of realworld instances that contain up to several thousand samples. In almost all cases, our method successfully decreases the depth of the initial decision tree; often, the decrease is significant. |
| Researcher Affiliation | Academia | Andr e Schidler and Stefan Szeider Algorithms and Complexity Group, TU Wien, Vienna, Austria EMAIL, EMAIL |
| Pseudocode | Yes | The pseudo-code for DT-SLIM(H) is shown in Algorithm 1. |
| Open Source Code | Yes | Source code can be found at https://github.com/ASchidler/ decisiontree and results at https://doi.org/10.5281/zenodo.4571570. |
| Open Datasets | Yes | We take all classification instances from the UCI Machine Learning Repository that use discrete domains and contain more than 500 samples. [...] Additionally, we take the instances from Narodytska et al. (2018) [...] For all the other instances we use 5-fold stratified cross-validation (also known as rotation estimation (Breiman et al. 1984; Kohavi 1995)) and report the average. |
| Dataset Splits | Yes | For all the other instances we use 5-fold stratified cross-validation (also known as rotation estimation (Breiman et al. 1984; Kohavi 1995)) and report the average. |
| Hardware Specification | Yes | We use servers with two Intel Xeon E5-2640 v4 CPUs running at 2.40 GHz and using Ubuntu 18.04. |
| Software Dependencies | Yes | We use the SAT solver Glucose 4.14 and the well-established decision tree inducers ITI 3.15 and Weka 3.8.46. |
| Experiment Setup | Yes | From the results we derive three sets of parameters: (i) d60 = 12 and c60 ranges from 70 to 15, (ii) d300 = 15 and c300 ranges from 300 to 90, and (iii) d800 = 39 and c800 ranges from 500 to 105. |