A Tighter Analysis of Spectral Clustering, and Beyond

Authors: Peter Macgregor, He Sun

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Besides its conceptual and theoretical significance, the practical impact of our work is demonstrated by the empirical analysis on both synthetic and real-world datasets, in which spectral clustering produces comparable or better results with fewer than k eigenvectors.
Researcher Affiliation Academia 1School of Informatics, University of Edinburgh, Edinburgh, United Kingdom. Correspondence to: Peter Macgregor <peter.macgregor@ed.ac.uk>, He Sun <h.sun@ed.ac.uk>.
Pseudocode No The paper describes the steps of the spectral clustering algorithm in Section 4 using numbered lists but does not present a formal pseudocode block or an algorithm box.
Open Source Code Yes We detail the experiment setup in the appendix, and the code to reproduce our results is available at https://github.com/pmacg/ spectral-clustering-meta-graphs.
Open Datasets Yes Our work is also related to studies on designing local, and distributed clustering algorithms based on different assumptions (e.g., (Czumaj et al., 2015; Orecchia & Zhu, 2014; Zhu et al., 2013)); ... The significance of this result is further demonstrated by our extensive experimental analysis on the well-known BSDS, MNIST, and USPS datasets (Arbelaez et al., 2011; Hull, 1994; Le Cun et al., 1998).
Dataset Splits No The paper does not explicitly describe train/validation/test splits, only mentions that the BSDS dataset 'consists of 500 images along with their ground-truth segmentations' and 'construct the k-nearest neighbour graph for k = 3' for MNIST/USPS.
Hardware Specification Yes Our experiments on synthetic data are performed on a desktop computer with an Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz processor and 16 GB RAM. The experiments on the BSDS, MNIST, and USPS datasets are performed on a compute server with 64 AMD EPYC 7302 16-Core Processors.
Software Dependencies No We implement the spectral clustering algorithm in Python, using the scipy library for computing eigenvectors, and the k-means algorithm from the sklearn library. While it mentions the libraries, specific version numbers are not provided.
Experiment Setup No The paper describes general aspects of its experiment setup, such as how similarity graphs are constructed (e.g., 'add an edge with weight exp( u v 2 /2σ2) where σ = 20'), but it does not specify hyperparameters like learning rates, batch sizes, or optimizer settings typically found in 'experimental setup' sections for training machine learning models.