Explaining Kernel Clustering via Decision Trees

Authors: Maximilian Fleissner, Leena Chennuru Vankadara, Debarghya Ghoshdastidar

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our algorithms on a number of benchmark datasets, including three synthetic clustering datasets, Pathbased, Aggregation and Flame (Fränti & Sieranoja) and real datasets, Iris (Fisher, 1936) and Wisconsin breast cancer (Street et al., 1993). On all five datasets, we evaluate Kernel IMM as well as the greedy cost minimization algorithms from Section 5, Kernel Ex KMC and Kernel Expand, both of which refine Kernel IMM by adding more leaves.
Researcher Affiliation Collaboration Maximilian Fleissner Technical University of Munich fleissner@cit.tum.de Leena C. Vankadara Amazon Research Tübingen vleena@amazon.com Debarghya Ghoshdastidar Technical University of Munich ghoshdas@cit.tum.de
Pseudocode Yes Algorithm 1 Kernel IMM for interpretable Taylor, or distance-based product kernels... Algorithm 2 Kernel Ex KMC... Algorithm 3 Kernel Expand
Open Source Code Yes Our code is available on Git Hub.
Open Datasets Yes We validate our algorithms on a number of benchmark datasets, including three synthetic clustering datasets, Pathbased, Aggregation and Flame (Fränti & Sieranoja) and real datasets, Iris (Fisher, 1936) and Wisconsin breast cancer (Street et al., 1993).
Dataset Splits No The paper mentions using benchmark datasets but does not explicitly describe the training, validation, and test splits (e.g., percentages or specific sample counts) used for reproducibility. It implies model selection by stating "choosing the best agreement with the ground truth as our starting point for Kernel IMM" but without specific split details.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only discusses the experimental setup in terms of kernels, hyperparameters, and datasets.
Software Dependencies No The paper mentions using "scikit-learn" in the context of Rand index computation (Pedregosa et al., 2011) and refers to "CART" for comparison, but it does not specify version numbers for any software dependencies.
Experiment Setup Yes When the Gaussian kernel is chosen, we run Kernel IMM both on the surrogate Taylor features from Definition 3 with M = 5, as well as on the surrogate features based on the kernel matrix, as defined in Equation (4), and choose the better one. We then refine the partition induced by Kernel IMM using both Kernel Ex KMC as well as Kernel Expand, constructing m = 6, m = 10 and m = 4 leaves respectively.