Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
The SpectACl of Nonconvex Clustering: A Spectral Approach to Density-Based Clustering
Authors: Sibylle Hess, Wouter Duivesteijn, Philipp Honysz, Katharina Morik3788-3795
AAAI 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments on synthetic and real-world data, we demonstrate that our approach provides robust and reliable clusterings. |
| Researcher Affiliation | Academia | Sibylle Hess TU Dortmund University Computer Science Faculty Artificial Intelligence LS VIII D-44221 Dortmund, Germany EMAIL Wouter Duivesteijn Technische Universiteit Eindhoven Faculteit Wiskunde & Informatica Data Mining Group Eindhoven, the Netherlands EMAIL Philipp Honysz, Katharina Morik TU Dortmund University Computer Science Faculty Artificial Intelligence LS VIII D-44221 Dortmund, Germany EMAIL EMAIL |
| Pseudocode | Yes | We summarize the resulting method SPECTACL (Spectral Averagely-dense Clustering) with the following steps: 1. compute the adjacency matrix W; 2. compute the truncated eigendecomposition W V (d)Λ(d)V (d) ; 3. compute the projected embedding Ujk = |V (d) jk ||λk|1/2; 4. compute a k-means clustering, finding r clusters on the embedded data U. |
| Open Source Code | Yes | Our Python implementation, and the data generating and evaluation script, are publicly available2. 2https://sfb876.tu-dortmund.de/spectacl |
| Open Datasets | Yes | We generate benchmark datasets, using the renowned scikit library. For each shape moons, circles, and blobs and noise specification we generate m = 1500 data points. The noise is Gaussian, as provided by the scikit noise parameter; cf. http://scikit-learn.org. ... The Pulsar dataset3 ... The Sloan dataset4 ... The MNIST dataset (Lecun et al. 1998) is a well-known collection of handwritten ciphers. The SNAP dataset refers to the Email EU core network data (Leskovec and Krevl 2014)... |
| Dataset Splits | No | The paper mentions generating synthetic datasets and using real-world datasets for evaluation, but it does not specify any explicit training, validation, or test splits (e.g., percentages, sample counts, or cross-validation setup) for reproduction. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory, or types of computing resources (e.g., cloud instances, clusters) used for running the experiments. |
| Software Dependencies | No | The paper mentions using "Python implementation" and the "scikit library" but does not provide specific version numbers for these or any other software dependencies needed for replication. |
| Experiment Setup | Yes | For DBSCAN, we use the same ϵ as for SPECTACL and set the parameter min Pts = 10, which delivered the best performance on average. We also compare against the provided Python implementation of Robust Spectral Clustering (Bojchevski, Matkovic, and G unnemann 2017) (denoted RSC), where default values apply. Unless mentioned otherwise, our setting for the embedding dimensionality is d = 50. |