Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

SORTeD Rashomon Sets of Sparse Decision Trees: Anytime Enumeration

Authors: Elif Arslan, Jacobus van der Linden, Serge Hoogendoorn, Marco Rinaldi, Emir Demirović

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a series of experiments with the following aims: (1) to assess SORTD s runtime efficiency in computing Rashomon sets; (2) to showcase that a small number of high-quality trees easily found by SORTD may be informative for model evaluation via variable importance analysis; and (3) to demonstrate SORTD s flexibility in enumerating and analysing Rashomon sets under varying objective functions.
Researcher Affiliation	Academia	Elif Arslan Delft University of Technology, Netherlands EMAIL Jacobus G. M. van der Linden Delft University of Technology, Netherlands J.G.M.vander EMAIL Serge Hoogendoorn Delft University of Technology, Netherlands EMAIL Marco Rinaldi Delft University of Technology, Netherlands EMAIL Emir Demirovi c Delft University of Technology, Netherlands EMAIL
Pseudocode	Yes	Alg. 1 shows how a search node computes its next best solution. [...] Algorithm 1 Get Next Solution() [...] Algorithm 2 Explore Candidates(...) [...] Alg. 3 adapts the depth-two subroutine [...] Algorithm 3 Calculate Three Node Sols(...) [...] Alg. 4 summarizes how the Rashomon set is computed [...] Algorithm 4 Main(...) [...] Alg. 5 outlines the procedure for generating all trees with a single branching node. [...] Algorithm 5 Calculate One Node Sol(...) [...] Alg. 6 handles the case in which the right child is a leaf [...] Algorithm 6 Calculate Two Node Sols(...)
Open Source Code	Yes	We implemented SORTD in C++ and provide it as a python package.1 1https://github.com/ConSol-Lab/pysortd
Open Datasets	Yes	For aims (1) and (2), we use the 30 benchmark binary classification datasets previously used to assess state-of-the-art methods [10, 15, 16, 30, 46]. [...] The original datasets can be obtained from the UCI Machine Learning repository [55] and from [51, 52, 56, 57]. For aim (3) we adopt common regression [47] and fairness benchmark datasets [48].
Dataset Splits	Yes	Each dataset was bootstrapped 20 times. [...] We run SORTD using the regularized accuracy objective, so each leaf node is penalized with the sparsity penalty λ. [...] CART and STree D are run repeatedly within that time budget on random samples of 50% of the total dataset.
Hardware Specification	Yes	All experiments are run single-threaded on an Intel Xeon E5-6448Y @ 2.1 GHz with 100 GB RAM [49], with a 300 seconds time limit.
Software Dependencies	No	We implemented SORTD in C++ and provide it as a python package. (Explanation: Although programming languages are mentioned, no specific version numbers for Python, C++, or any specific libraries/solvers are provided.)
Experiment Setup	Yes	We varied the depth budget d {3, 4, 5} and the complexity cost λ {0.001, 0.01, 0.1}. [...] with a 300 seconds time limit. [...] We use λ = 0.01 and a depth budget of four. [...] for max-depth four, λ = 0.001, and ε = 0.1. [...] with max-depth d = 3, discrimination limit δ = 1%, and sparsity penalty λ = 0.01 (for SORTD).