reproducibilityindex.ai

Optimal Decision Trees For Interpretable Clustering with Constraints

Authors: Pouya Shati, Eldan Cohen, Sheila McIlraith

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments with a range of real-world and synthetic datasets demonstrate that our approach can produce high-quality and interpretable constrained clustering solutions.
Researcher Affiliation	Academia	1Department of Computer Science, University of Toronto, Toronto, Canada 2Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Canada 3Vector Institute, Toronto, Canada
Pseudocode	No	The paper describes mathematical formulations and clauses for its SAT encoding but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or link indicating that the source code for their methodology is publicly available.
Open Datasets	Yes	Datasets. We run experiments on seven real datasets from the UCI repository [Dua and Graff, 2017] and four synthetic datasets from FCPS [Ultsch and Lötsch, 2020].
Dataset Splits	No	The paper specifies datasets used for experiments but does not provide details on training, validation, or test dataset splits. It mentions generating 20 random sets of constraints, but this is not a train/validation/test split.
Hardware Specification	Yes	We run experiments on a server with two 12-core Intel E5-2697v2 CPUs and 128G of RAM.
Software Dependencies	Yes	We use the Loandra solver [Berg et al., 2019] to solve our tree clustering encoding... We have implemented the model using the Gurobi v10 solver and extended it to support clustering constraints
Experiment Setup	Yes	We fix the value of approximation at ". = 0.1. Consistent with previous work [Dao et al., 2016; Babaki et al., 2014], we set the solver time limit to 30 minutes. To avoid bias in results due to a specific set of constraints, we generate 20 random sets of constraints and report average values for the evaluation metrics and the runtime. For all datasets, we normalize the values of each feature in the range [0, 100].