Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Robust Optimal Classification Trees against Adversarial Examples
Authors: Daniël Vos, Sicco Verwer8520-8528
AAAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate that the existing heuristics achieve close to optimal scores while ROCT achieves state-of-the-art scores. |
| Researcher Affiliation | Academia | Dani el Vos, Sicco Verwer Delft University of Technology EMAIL, EMAIL |
| Pseudocode | No | No explicitly labeled pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | ROCT1 uses a novel translation of the problem of fitting robust decision trees into Mixed-Integer Linear Programming (MILP) or Maximum Satisfiability (Max SAT) formulations. 1https://github.com/tudelft-cda-lab/ROCT |
| Open Datasets | Yes | The datasets are summarized in Table 3 and are available on Open ML4. 4http://www.openml.org |
| Dataset Splits | Yes | To this end we select the best value for the maximum depth hyperparameter using 3-fold stratified cross validation on the training set. |
| Hardware Specification | Yes | All of our experiments ran on 15 Intel Xeon CPU cores and 72 GB of RAM total, where each algorithm ran on a single core. |
| Software Dependencies | Yes | solve it using GUROBI2 9. (...) Both algorithms use the Glucose3 4.1 SAT solver. |
| Experiment Setup | Yes | In each run, every algorithm gets 30 minutes to fit. (...) To this end we select the best value for the maximum depth hyperparameter using 3-fold stratified cross validation on the training set. (...) For each dataset we used an 80%-20% train-test split. (...) As the dual of the MILP-based formulations is hard to solve, we focus the solver on the primal problem. (...) In our experiments we choose three ϵ values for each dataset such that their values corresponds to an adversarial accuracy bound that is at 25%-50%-75% of the range. |