Explaining Kernel Clustering via Decision Trees
Authors: Maximilian Fleissner, Leena Chennuru Vankadara, Debarghya Ghoshdastidar
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our algorithms on a number of benchmark datasets, including three synthetic clustering datasets, Pathbased, Aggregation and Flame (Fränti & Sieranoja) and real datasets, Iris (Fisher, 1936) and Wisconsin breast cancer (Street et al., 1993). On all five datasets, we evaluate Kernel IMM as well as the greedy cost minimization algorithms from Section 5, Kernel Ex KMC and Kernel Expand, both of which refine Kernel IMM by adding more leaves. |
| Researcher Affiliation | Collaboration | Maximilian Fleissner Technical University of Munich fleissner@cit.tum.de Leena C. Vankadara Amazon Research Tübingen vleena@amazon.com Debarghya Ghoshdastidar Technical University of Munich ghoshdas@cit.tum.de |
| Pseudocode | Yes | Algorithm 1 Kernel IMM for interpretable Taylor, or distance-based product kernels... Algorithm 2 Kernel Ex KMC... Algorithm 3 Kernel Expand |
| Open Source Code | Yes | Our code is available on Git Hub. |
| Open Datasets | Yes | We validate our algorithms on a number of benchmark datasets, including three synthetic clustering datasets, Pathbased, Aggregation and Flame (Fränti & Sieranoja) and real datasets, Iris (Fisher, 1936) and Wisconsin breast cancer (Street et al., 1993). |
| Dataset Splits | No | The paper mentions using benchmark datasets but does not explicitly describe the training, validation, and test splits (e.g., percentages or specific sample counts) used for reproducibility. It implies model selection by stating "choosing the best agreement with the ground truth as our starting point for Kernel IMM" but without specific split details. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only discusses the experimental setup in terms of kernels, hyperparameters, and datasets. |
| Software Dependencies | No | The paper mentions using "scikit-learn" in the context of Rand index computation (Pedregosa et al., 2011) and refers to "CART" for comparison, but it does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | When the Gaussian kernel is chosen, we run Kernel IMM both on the surrogate Taylor features from Definition 3 with M = 5, as well as on the surrogate features based on the kernel matrix, as defined in Equation (4), and choose the better one. We then refine the partition induced by Kernel IMM using both Kernel Ex KMC as well as Kernel Expand, constructing m = 6, m = 10 and m = 4 leaves respectively. |