reproducibilityindex.ai

The SpectACl of Nonconvex Clustering: A Spectral Approach to Density-Based Clustering

Authors: Sibylle Hess, Wouter Duivesteijn, Philipp Honysz, Katharina Morik3788-3795

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experiments on synthetic and real-world data, we demonstrate that our approach provides robust and reliable clusterings.
Researcher Affiliation	Academia	Sibylle Hess TU Dortmund University Computer Science Faculty Artiﬁcial Intelligence LS VIII D-44221 Dortmund, Germany sibylle.hess@tu-dortmund.de Wouter Duivesteijn Technische Universiteit Eindhoven Faculteit Wiskunde & Informatica Data Mining Group Eindhoven, the Netherlands w.duivesteijn@tue.nl Philipp Honysz, Katharina Morik TU Dortmund University Computer Science Faculty Artiﬁcial Intelligence LS VIII D-44221 Dortmund, Germany philipp.honysz@tu-dortmund.de katharina.morik@tu-dortmund.de
Pseudocode	Yes	We summarize the resulting method SPECTACL (Spectral Averagely-dense Clustering) with the following steps: 1. compute the adjacency matrix W; 2. compute the truncated eigendecomposition W V (d)Λ(d)V (d) ; 3. compute the projected embedding Ujk = \|V (d) jk \|\|λk\|1/2; 4. compute a k-means clustering, ﬁnding r clusters on the embedded data U.
Open Source Code	Yes	Our Python implementation, and the data generating and evaluation script, are publicly available2. 2https://sfb876.tu-dortmund.de/spectacl
Open Datasets	Yes	We generate benchmark datasets, using the renowned scikit library. For each shape moons, circles, and blobs and noise speciﬁcation we generate m = 1500 data points. The noise is Gaussian, as provided by the scikit noise parameter; cf. http://scikit-learn.org. ... The Pulsar dataset3 ... The Sloan dataset4 ... The MNIST dataset (Lecun et al. 1998) is a well-known collection of handwritten ciphers. The SNAP dataset refers to the Email EU core network data (Leskovec and Krevl 2014)...
Dataset Splits	No	The paper mentions generating synthetic datasets and using real-world datasets for evaluation, but it does not specify any explicit training, validation, or test splits (e.g., percentages, sample counts, or cross-validation setup) for reproduction.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, memory, or types of computing resources (e.g., cloud instances, clusters) used for running the experiments.
Software Dependencies	No	The paper mentions using "Python implementation" and the "scikit library" but does not provide specific version numbers for these or any other software dependencies needed for replication.
Experiment Setup	Yes	For DBSCAN, we use the same ϵ as for SPECTACL and set the parameter min Pts = 10, which delivered the best performance on average. We also compare against the provided Python implementation of Robust Spectral Clustering (Bojchevski, Matkovic, and G unnemann 2017) (denoted RSC), where default values apply. Unless mentioned otherwise, our setting for the embedding dimensionality is d = 50.