Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning Optimal Decision Trees with SAT

Authors: Nina Narodytska, Alexey Ignatiev, Filipe Pereira, Joao Marques-Silva

IJCAI 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform an experimental evalution on a set of benchmarks from [Bessiere et al., 2009; Olson et al., 2017]. We ran our experiments on Intel(R) Xeon(R) CPU 3.50GHz. The Glucose3 SAT solver [Audemard and Simon, 2009] was used, with a total timeout of 1000 sec in all experiments. The memory limit is 2GB. We also employed the ITI algorithm to produce decision trees from labeled examples [Utgoff et al., 1997a; Utgoff et al., 1997b].
Researcher Affiliation	Collaboration	1 VMware Research, CA, USA 2 LASIGE, Faculdade de Ciˆencias, Universidade de Lisboa, Portugal 3 ISDCT SB RAS, Irkutsk, Russia
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. Table 2 is a 'Description of variables', not an algorithm.
Open Source Code	No	The paper does not provide any explicit statements or links indicating that the source code for the described methodology is open or publicly available.
Open Datasets	Yes	We perform an experimental evalution on a set of benchmarks from [Bessiere et al., 2009; Olson et al., 2017]. We start with toy benchmarks from [Bessiere et al., 2009; Olson et al., 2017] that admit a small DT (less than 15 nodes). The paper also lists specific datasets in Tables 4, 5, and 6, such as 'Mouse', 'Weather', 'Car', 'Shuttle', 'Colic', 'Australian', 'Cancer', 'House Votes', etc.
Dataset Splits	No	The paper mentions a 'test set' but does not explicitly specify a distinct 'validation' set or its split percentage/methodology. It focuses on training and testing without detailing a separate validation phase.
Hardware Specification	Yes	We ran our experiments on Intel(R) Xeon(R) CPU 3.50GHz.
Software Dependencies	No	The paper mentions 'Glucose3 SAT solver [Audemard and Simon, 2009]', 'ITI algorithm [Utgoff et al., 1997a; Utgoff et al., 1997b]', and 'scikit-learn [Pedregosa and et al., 2011]' but does not provide specific version numbers for any of these software components.
Experiment Setup	Yes	The Glucose3 SAT solver [Audemard and Simon, 2009] was used, with a total timeout of 1000 sec in all experiments. The memory limit is 2GB. To find the optimal tree we perform a linear search on the total number of nodes starting from the upper bound provided by the ITI tree. We only consider trees with an odd number of nodes. We randomly sampled a fraction r of samples from datasets.