Fast and Simple Spectral Clustering in Theory and Practice
Authors: Peter Macgregor
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the new algorithm on several synthetic and real-world datasets, finding that it is significantly faster than alternative clustering algorithms, while producing results with approximately the same clustering accuracy. |
| Researcher Affiliation | Academia | Peter Macgregor School of Informatics University of Edinburgh peter.macgregor@ed.ac.uk |
| Pseudocode | Yes | Algorithm 1: POWERMETHOD(M Rn n, x0 Rn, t Z 0); Algorithm 2: FASTSPECTRALCLUSTER(G = (V, E), k Z 0, ϵ [0, 1]) |
| Open Source Code | Yes | The code to reproduce the experiments is available at https://github.com/pmacg/fast-spectral-clustering. |
| Open Datasets | Yes | We evaluate the spectral clustering algorithms on synthetic data drawn from the stochastic block model. [...] MNIST [18]: [...] Pen Digits [1]: [...] Fashion [34]: [...] HAR (Human Activity Recognition) [2]: [...] Letter [9]: [...] The datasets are all made available by the Open ML [31] project, and can be downloaded with the scikit-learn library [27]. |
| Dataset Splits | No | The paper mentions using synthetic and real-world datasets for evaluation but does not specify explicit training, validation, or test splits with percentages, sample counts, or references to predefined splits for reproducibility. For synthetic data, it describes generation parameters, and for real-world data, it states pre-processing steps but no data partitioning for training/validation/testing. |
| Hardware Specification | Yes | All experiments are performed on an HP laptop with an 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz processor and 32 GB RAM. |
| Software Dependencies | No | We implement all algorithms in Python, using the numpy [12], scipy [32], stag [22], and scikit-learn [27], libraries for matrix manipulation, eigenvector computation, graph processing, and k-means approximation respectively. While the libraries are mentioned, no specific version numbers for Python or any of these libraries are provided. |
| Experiment Setup | Yes | Remark 3.3. The constants in the definition of l and t in Algorithm 2 are based on those in the analysis of Makarychev et al. [23] and this paper. In practice, we find that setting l = log(k) and t = 10 log(n/k) works well. [...] Given parameters n Z 0, k Z 0, p [0, 1], and q [0, 1], we generate a graph G = (V, E) with n vertices and k ground-truth clusters S1, . . . , Sk of size n/k. [...] We first pre-process each dataset by computing the k nearest neighbour graph from the data, for k = 10. |