reproducibilityindex.ai

Fast and Simple Spectral Clustering in Theory and Practice

Authors: Peter Macgregor

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the new algorithm on several synthetic and real-world datasets, finding that it is significantly faster than alternative clustering algorithms, while producing results with approximately the same clustering accuracy.
Researcher Affiliation	Academia	Peter Macgregor School of Informatics University of Edinburgh peter.macgregor@ed.ac.uk
Pseudocode	Yes	Algorithm 1: POWERMETHOD(M Rn n, x0 Rn, t Z 0); Algorithm 2: FASTSPECTRALCLUSTER(G = (V, E), k Z 0, ϵ [0, 1])
Open Source Code	Yes	The code to reproduce the experiments is available at https://github.com/pmacg/fast-spectral-clustering.
Open Datasets	Yes	We evaluate the spectral clustering algorithms on synthetic data drawn from the stochastic block model. [...] MNIST [18]: [...] Pen Digits [1]: [...] Fashion [34]: [...] HAR (Human Activity Recognition) [2]: [...] Letter [9]: [...] The datasets are all made available by the Open ML [31] project, and can be downloaded with the scikit-learn library [27].
Dataset Splits	No	The paper mentions using synthetic and real-world datasets for evaluation but does not specify explicit training, validation, or test splits with percentages, sample counts, or references to predefined splits for reproducibility. For synthetic data, it describes generation parameters, and for real-world data, it states pre-processing steps but no data partitioning for training/validation/testing.
Hardware Specification	Yes	All experiments are performed on an HP laptop with an 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz processor and 32 GB RAM.
Software Dependencies	No	We implement all algorithms in Python, using the numpy [12], scipy [32], stag [22], and scikit-learn [27], libraries for matrix manipulation, eigenvector computation, graph processing, and k-means approximation respectively. While the libraries are mentioned, no specific version numbers for Python or any of these libraries are provided.
Experiment Setup	Yes	Remark 3.3. The constants in the definition of l and t in Algorithm 2 are based on those in the analysis of Makarychev et al. [23] and this paper. In practice, we find that setting l = log(k) and t = 10 log(n/k) works well. [...] Given parameters n Z 0, k Z 0, p [0, 1], and q [0, 1], we generate a graph G = (V, E) with n vertices and k ground-truth clusters S1, . . . , Sk of size n/k. [...] We first pre-process each dataset by computing the k nearest neighbour graph from the data, for k = 10.