Sparse Quantized Spectral Clustering
Authors: Zhenyu Liao, Romain Couillet, Michael W. Mahoney
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here, we exploit tools from random matrix theory to make precise statements about how the eigenspectrum of a matrix changes under such nonlinear transformations. In particular, we show that very little change occurs in the informative eigenstructure, even under drastic sparsification/quantization, and consequently that very little downstream performance loss occurs when working with very aggressively sparsified or quantized spectral clustering problems. We illustrate how these results depend on the nonlinearity, we characterize a phase transition beyond which spectral clustering becomes possible, and we show when such nonlinear transformations can introduce spurious non-informative eigenvectors. Experiments on real-world data further corroborate our findings. |
| Researcher Affiliation | Academia | Zhenyu Liao ICSI and Department of Statistics University of California, Berkeley, USA zhenyu.liao@berkeley.edu Romain Couillet G-STATS Data Science Chair, GIPSA-lab University Grenobles-Alpes, France romain.couillet@gipsa-lab.grenoble-inp.fr Michael W. Mahoney ICSI and Department of Statistics University of California, Berkeley, USA mmahoney@stat.berkeley.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | Figure 3: Here, p = 512, n = 256, µ N(0, Ip), v = [ 1n/2; 1n/2] on Gaussian data. The results are obtained by averaging over 250 runs. ... Figure 7 next evaluates the clustering performance... on the popular MNIST datasets (Le Cun et al., 1998). ... Figure 15 compares the clustering performances... on other MNIST-like datasets including the Fashion-MNIST (Xiao et al., 2017), Kuzushiji-MNIST (Clanuwat et al., 2018), and Kannada-MNIST (Prabhu, 2019) datasets. Then, Figure 16 compares the performances on the representations of the Image Net dataset (Deng et al., 2009). |
| Dataset Splits | No | The paper discusses clustering performance and misclassification rates, but does not explicitly provide specific dataset split information (e.g., percentages, sample counts for training/validation/test sets) for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | Figure 3: ...p = 512, n = 256, µ N(0, Ip), v = [ 1n/2; 1n/2] on Gaussian data. The results are obtained by averaging over 250 runs. ... Figure 7: ...with n = 2 048 and performance of the linear function in black. Results averaged over 100 runs. |