reproducibilityindex.ai

Optimal Decision Diagrams for Classification

Authors: Alexandre M. Florio, Pedro Martins, Maximilian Schiffer, Thiago Serra, Thibaut Vidal

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present numerical analyses showing that our model allows training ODDs in short computational times, and that ODDs achieve better accuracy than optimal decision trees, while allowing for improved stability without signiﬁcant accuracy losses. ... We conduct an extensive computational study to evaluate our approach s scalability and compare the resulting decision diagrams with classical decision trees. We observe that training ODDs requires a computational effort comparable to optimal decision-tree training but leads to more parsimonious and accurate models. ... We focus our experiments on the same selection of 54 datasets as in Bertsimas and Dunn (2017). All these datasets are publicly available from the UCI machine learning repository (Dua and Graff 2017). We split each dataset into a training, validation, and testing subset of samples with respective proportions of 50%, 25%, and 25%.
Researcher Affiliation	Academia	Alexandre M. Florio1,2, Pedro Martins3, Maximilian Schiffer4, Thiago Serra5, Thibaut Vidal1,2* 1 CIRRELT & SCALE-AI Chair in Data-Driven Supply Chains, 2 Department of Mathematical and Industrial Engineering, Polytechnique Montr eal, Canada, 3 Department of Computer Science, Pontiﬁcal Catholic University of Rio de Janeiro, Brazil, 4 School of Management & Munich Data Science Institute, Technical University of Munich, Germany, 5 Freeman College of Management, Bucknell University, USA
Pseudocode	No	The paper describes algorithms and strategies (e.g., "two-step search strategy", "top-down construction approach") in prose, but it does not include a formally labeled "Pseudocode" or "Algorithm" block with structured code-like steps.
Open Source Code	No	All data, source code, and additional details on the computational results are provided in the supplemental material and will be provided as a public Git repository upon publication.
Open Datasets	Yes	We focus our experiments on the same selection of 54 datasets as in Bertsimas and Dunn (2017). All these datasets are publicly available from the UCI machine learning repository (Dua and Graff 2017). ... Dua, D.; and Graff, C. 2017. UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences, http://archive.ics.uci.edu/ml, Accessed 2023-04-03.
Dataset Splits	Yes	We split each dataset into a training, validation, and testing subset of samples with respective proportions of 50%, 25%, and 25%.
Hardware Specification	Yes	Each validation and test experiment has been run on a single thread of an Intel Gold 6148 Skylake @2.40GHz CPU.
Software Dependencies	Yes	All our algorithms have been implemented in Python 3.8 and can be readily executed from a single script. We use Gurobi 9.1.0 (via gurobipy) for solving the mathematical models.
Experiment Setup	Yes	We study the computational tractability of our approach for α {0.01, 0.1, 0.2, 0.5, 1} and for four decision diagram skeletons: (1 2 4 8), (1 2 4 4 4), (1 2 3 3 3 3 3) and (1 2 2 2 2 2 2 2), hereby referred to as Skeletons I to IV, respectively. ... We set a CPU time limit of TMAX = 600 seconds for this phase. ... We repeat all our experiments ﬁve times for each dataset, using a different seed and thus a different random separation of the samples for each run. ... In Constraint (13), ε should be a small constant greater than the numerical precision of the solver (set to ε = 10 4 in our experiments) that permits to express strict inequality. ... To obtain a better initial diagram, we repeat the construction process for 60 seconds and consider only a random subset of 60% of the features during each split selection.