Optimal Decision Trees For Interpretable Clustering with Constraints

Authors: Pouya Shati, Eldan Cohen, Sheila McIlraith

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments with a range of real-world and synthetic datasets demonstrate that our approach can produce high-quality and interpretable constrained clustering solutions.
Researcher Affiliation Academia 1Department of Computer Science, University of Toronto, Toronto, Canada 2Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Canada 3Vector Institute, Toronto, Canada
Pseudocode No The paper describes mathematical formulations and clauses for its SAT encoding but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for their methodology is publicly available.
Open Datasets Yes Datasets. We run experiments on seven real datasets from the UCI repository [Dua and Graff, 2017] and four synthetic datasets from FCPS [Ultsch and Lötsch, 2020].
Dataset Splits No The paper specifies datasets used for experiments but does not provide details on training, validation, or test dataset splits. It mentions generating 20 random sets of constraints, but this is not a train/validation/test split.
Hardware Specification Yes We run experiments on a server with two 12-core Intel E5-2697v2 CPUs and 128G of RAM.
Software Dependencies Yes We use the Loandra solver [Berg et al., 2019] to solve our tree clustering encoding... We have implemented the model using the Gurobi v10 solver and extended it to support clustering constraints
Experiment Setup Yes We fix the value of approximation at ". = 0.1. Consistent with previous work [Dao et al., 2016; Babaki et al., 2014], we set the solver time limit to 30 minutes. To avoid bias in results due to a specific set of constraints, we generate 20 random sets of constraints and report average values for the evaluation metrics and the runtime. For all datasets, we normalize the values of each feature in the range [0, 100].