reproducibilityindex.ai

Explaining Kernel Clustering via Decision Trees

Authors: Maximilian Fleissner, Leena Chennuru Vankadara, Debarghya Ghoshdastidar

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our algorithms on a number of benchmark datasets, including three synthetic clustering datasets, Pathbased, Aggregation and Flame (Fränti & Sieranoja) and real datasets, Iris (Fisher, 1936) and Wisconsin breast cancer (Street et al., 1993). On all five datasets, we evaluate Kernel IMM as well as the greedy cost minimization algorithms from Section 5, Kernel Ex KMC and Kernel Expand, both of which refine Kernel IMM by adding more leaves.
Researcher Affiliation	Collaboration	Maximilian Fleissner Technical University of Munich fleissner@cit.tum.de Leena C. Vankadara Amazon Research Tübingen vleena@amazon.com Debarghya Ghoshdastidar Technical University of Munich ghoshdas@cit.tum.de
Pseudocode	Yes	Algorithm 1 Kernel IMM for interpretable Taylor, or distance-based product kernels... Algorithm 2 Kernel Ex KMC... Algorithm 3 Kernel Expand
Open Source Code	Yes	Our code is available on Git Hub.
Open Datasets	Yes	We validate our algorithms on a number of benchmark datasets, including three synthetic clustering datasets, Pathbased, Aggregation and Flame (Fränti & Sieranoja) and real datasets, Iris (Fisher, 1936) and Wisconsin breast cancer (Street et al., 1993).
Dataset Splits	No	The paper mentions using benchmark datasets but does not explicitly describe the training, validation, and test splits (e.g., percentages or specific sample counts) used for reproducibility. It implies model selection by stating "choosing the best agreement with the ground truth as our starting point for Kernel IMM" but without specific split details.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only discusses the experimental setup in terms of kernels, hyperparameters, and datasets.
Software Dependencies	No	The paper mentions using "scikit-learn" in the context of Rand index computation (Pedregosa et al., 2011) and refers to "CART" for comparison, but it does not specify version numbers for any software dependencies.
Experiment Setup	Yes	When the Gaussian kernel is chosen, we run Kernel IMM both on the surrogate Taylor features from Definition 3 with M = 5, as well as on the surrogate features based on the kernel matrix, as defined in Equation (4), and choose the better one. We then refine the partition induced by Kernel IMM using both Kernel Ex KMC as well as Kernel Expand, constructing m = 6, m = 10 and m = 4 leaves respectively.