Optimal Decision Trees For Interpretable Clustering with Constraints
Authors: Pouya Shati, Eldan Cohen, Sheila McIlraith
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments with a range of real-world and synthetic datasets demonstrate that our approach can produce high-quality and interpretable constrained clustering solutions. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Toronto, Toronto, Canada 2Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Canada 3Vector Institute, Toronto, Canada |
| Pseudocode | No | The paper describes mathematical formulations and clauses for its SAT encoding but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for their methodology is publicly available. |
| Open Datasets | Yes | Datasets. We run experiments on seven real datasets from the UCI repository [Dua and Graff, 2017] and four synthetic datasets from FCPS [Ultsch and Lötsch, 2020]. |
| Dataset Splits | No | The paper specifies datasets used for experiments but does not provide details on training, validation, or test dataset splits. It mentions generating 20 random sets of constraints, but this is not a train/validation/test split. |
| Hardware Specification | Yes | We run experiments on a server with two 12-core Intel E5-2697v2 CPUs and 128G of RAM. |
| Software Dependencies | Yes | We use the Loandra solver [Berg et al., 2019] to solve our tree clustering encoding... We have implemented the model using the Gurobi v10 solver and extended it to support clustering constraints |
| Experiment Setup | Yes | We fix the value of approximation at ". = 0.1. Consistent with previous work [Dao et al., 2016; Babaki et al., 2014], we set the solver time limit to 30 minutes. To avoid bias in results due to a specific set of constraints, we generate 20 random sets of constraints and report average values for the evaluation metrics and the runtime. For all datasets, we normalize the values of each feature in the range [0, 100]. |