The SpectACl of Nonconvex Clustering: A Spectral Approach to Density-Based Clustering

Authors: Sibylle Hess, Wouter Duivesteijn, Philipp Honysz, Katharina Morik3788-3795

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments on synthetic and real-world data, we demonstrate that our approach provides robust and reliable clusterings.
Researcher Affiliation Academia Sibylle Hess TU Dortmund University Computer Science Faculty Artificial Intelligence LS VIII D-44221 Dortmund, Germany sibylle.hess@tu-dortmund.de Wouter Duivesteijn Technische Universiteit Eindhoven Faculteit Wiskunde & Informatica Data Mining Group Eindhoven, the Netherlands w.duivesteijn@tue.nl Philipp Honysz, Katharina Morik TU Dortmund University Computer Science Faculty Artificial Intelligence LS VIII D-44221 Dortmund, Germany philipp.honysz@tu-dortmund.de katharina.morik@tu-dortmund.de
Pseudocode Yes We summarize the resulting method SPECTACL (Spectral Averagely-dense Clustering) with the following steps: 1. compute the adjacency matrix W; 2. compute the truncated eigendecomposition W V (d)Λ(d)V (d) ; 3. compute the projected embedding Ujk = |V (d) jk ||λk|1/2; 4. compute a k-means clustering, finding r clusters on the embedded data U.
Open Source Code Yes Our Python implementation, and the data generating and evaluation script, are publicly available2. 2https://sfb876.tu-dortmund.de/spectacl
Open Datasets Yes We generate benchmark datasets, using the renowned scikit library. For each shape moons, circles, and blobs and noise specification we generate m = 1500 data points. The noise is Gaussian, as provided by the scikit noise parameter; cf. http://scikit-learn.org. ... The Pulsar dataset3 ... The Sloan dataset4 ... The MNIST dataset (Lecun et al. 1998) is a well-known collection of handwritten ciphers. The SNAP dataset refers to the Email EU core network data (Leskovec and Krevl 2014)...
Dataset Splits No The paper mentions generating synthetic datasets and using real-world datasets for evaluation, but it does not specify any explicit training, validation, or test splits (e.g., percentages, sample counts, or cross-validation setup) for reproduction.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, memory, or types of computing resources (e.g., cloud instances, clusters) used for running the experiments.
Software Dependencies No The paper mentions using "Python implementation" and the "scikit library" but does not provide specific version numbers for these or any other software dependencies needed for replication.
Experiment Setup Yes For DBSCAN, we use the same ϵ as for SPECTACL and set the parameter min Pts = 10, which delivered the best performance on average. We also compare against the provided Python implementation of Robust Spectral Clustering (Bojchevski, Matkovic, and G unnemann 2017) (denoted RSC), where default values apply. Unless mentioned otherwise, our setting for the embedding dimensionality is d = 50.