Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Optimal Decision Trees for Interpretable and Constrained Clustering

Authors: Pouya Shati, Yuliang Song, Eldan Cohen, Sheila A. McIlraith

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments involving a range of real-world and synthetic datasets demonstrate that our approach can produce interpretable clustering solutions that are of superior quality compared to their non-interpretable counterparts, with or without the addition of constraints. We further provide new insights into the trade-off between interpretability and the satisfaction of user-specified constraints, presenting extensions to our clustering approach that treat the satisfaction of constraints as an additional optimization objective.
Researcher Affiliation	Academia	POUYA SHATI, Department of Computer Science, University of Toronto, Canada and Vector Institute for Artificial Intelligence, Canada YULIANG SONG, Department of Mechanical and Industrial Engineering, University of Toronto, Canada ELDAN COHEN, Department of Mechanical and Industrial Engineering, University of Toronto, Canada SHEILA A. MCILRAITH, Department of Computer Science, University of Toronto, Canada and Vector Institute for Artificial Intelligence, Canada
Pseudocode	Yes	Algorithm 1 Smart Pairs
Open Source Code	No	The paper states: "Our approach is implemented in Python and Java and we use the Loandra solver (Berg, Demirović, et al. 2019) to solve the encodings that we generate." This describes the implementation details and tools used, but does not include an explicit statement or link confirming that the authors' own source code for the methodology is openly available or will be released.
Open Datasets	Yes	Our benchmarks include seven real datasets from the UCI repository (Dua and Graff 2017) and four synthetic datasets from FCPS (Ultsch and Lötsch 2020).
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits. It mentions using 'ground-truth labels' for evaluation and that 'each experiment is run with 20 different seeds', but no specific methodology for partitioning the datasets into distinct training, testing, or validation sets is described in the context of the clustering problem.
Hardware Specification	Yes	We run experiments on a server with two 12-core Intel E5-2697v2 CPUs and 128G of RAM.
Software Dependencies	Yes	Our approach is implemented in Python and Java and we use the Loandra solver (Berg, Demirović, et al. 2019) to solve the encodings that we generate. ... We have implemented the model using the Gurobi v10 solver.
Experiment Setup	Yes	We set a time limit of 30 minutes every time a call to the solver is made, a feasible solution is salvaged in case of a timeout if possible. ... For each dataset, we fix the depth value to the lowest number that provides more than twice as many leaves as there are clusters. ... Specifically, for a given 𝜅value with 0 𝜅 \|𝑋\| 1 2 , we generate a set of 𝜅 \|𝑋\| random pairwise constraints following Wagstaff and Cardie 2000: (1) We generate 𝜅 \|𝑋\| pairs of data points without repetition; (2) We add each pair to the must-links (𝑀𝐿) constraint set if both data points share the same ground-truth label and to the cannot-links (𝐶𝐿) set otherwise. To mitigate the potential bias due to the random generation of constraints, each experiment is run with 20 different seeds with the overall mean being reported. ... we solve Problem 1 with 𝜖= 0.1 for ML and CL sets generated with various 𝜅values.