The SpectACl of Nonconvex Clustering: A Spectral Approach to Density-Based Clustering
Authors: Sibylle Hess, Wouter Duivesteijn, Philipp Honysz, Katharina Morik3788-3795
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments on synthetic and real-world data, we demonstrate that our approach provides robust and reliable clusterings. |
| Researcher Affiliation | Academia | Sibylle Hess TU Dortmund University Computer Science Faculty Artificial Intelligence LS VIII D-44221 Dortmund, Germany sibylle.hess@tu-dortmund.de Wouter Duivesteijn Technische Universiteit Eindhoven Faculteit Wiskunde & Informatica Data Mining Group Eindhoven, the Netherlands w.duivesteijn@tue.nl Philipp Honysz, Katharina Morik TU Dortmund University Computer Science Faculty Artificial Intelligence LS VIII D-44221 Dortmund, Germany philipp.honysz@tu-dortmund.de katharina.morik@tu-dortmund.de |
| Pseudocode | Yes | We summarize the resulting method SPECTACL (Spectral Averagely-dense Clustering) with the following steps: 1. compute the adjacency matrix W; 2. compute the truncated eigendecomposition W V (d)Λ(d)V (d) ; 3. compute the projected embedding Ujk = |V (d) jk ||λk|1/2; 4. compute a k-means clustering, finding r clusters on the embedded data U. |
| Open Source Code | Yes | Our Python implementation, and the data generating and evaluation script, are publicly available2. 2https://sfb876.tu-dortmund.de/spectacl |
| Open Datasets | Yes | We generate benchmark datasets, using the renowned scikit library. For each shape moons, circles, and blobs and noise specification we generate m = 1500 data points. The noise is Gaussian, as provided by the scikit noise parameter; cf. http://scikit-learn.org. ... The Pulsar dataset3 ... The Sloan dataset4 ... The MNIST dataset (Lecun et al. 1998) is a well-known collection of handwritten ciphers. The SNAP dataset refers to the Email EU core network data (Leskovec and Krevl 2014)... |
| Dataset Splits | No | The paper mentions generating synthetic datasets and using real-world datasets for evaluation, but it does not specify any explicit training, validation, or test splits (e.g., percentages, sample counts, or cross-validation setup) for reproduction. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory, or types of computing resources (e.g., cloud instances, clusters) used for running the experiments. |
| Software Dependencies | No | The paper mentions using "Python implementation" and the "scikit library" but does not provide specific version numbers for these or any other software dependencies needed for replication. |
| Experiment Setup | Yes | For DBSCAN, we use the same ϵ as for SPECTACL and set the parameter min Pts = 10, which delivered the best performance on average. We also compare against the provided Python implementation of Robust Spectral Clustering (Bojchevski, Matkovic, and G unnemann 2017) (denoted RSC), where default values apply. Unless mentioned otherwise, our setting for the embedding dimensionality is d = 50. |