reproducibilityindex.ai

Sparse Quantized Spectral Clustering

Authors: Zhenyu Liao, Romain Couillet, Michael W. Mahoney

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here, we exploit tools from random matrix theory to make precise statements about how the eigenspectrum of a matrix changes under such nonlinear transformations. In particular, we show that very little change occurs in the informative eigenstructure, even under drastic sparsiﬁcation/quantization, and consequently that very little downstream performance loss occurs when working with very aggressively sparsiﬁed or quantized spectral clustering problems. We illustrate how these results depend on the nonlinearity, we characterize a phase transition beyond which spectral clustering becomes possible, and we show when such nonlinear transformations can introduce spurious non-informative eigenvectors. Experiments on real-world data further corroborate our ﬁndings.
Researcher Affiliation	Academia	Zhenyu Liao ICSI and Department of Statistics University of California, Berkeley, USA zhenyu.liao@berkeley.edu Romain Couillet G-STATS Data Science Chair, GIPSA-lab University Grenobles-Alpes, France romain.couillet@gipsa-lab.grenoble-inp.fr Michael W. Mahoney ICSI and Department of Statistics University of California, Berkeley, USA mmahoney@stat.berkeley.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	Figure 3: Here, p = 512, n = 256, µ N(0, Ip), v = [ 1n/2; 1n/2] on Gaussian data. The results are obtained by averaging over 250 runs. ... Figure 7 next evaluates the clustering performance... on the popular MNIST datasets (Le Cun et al., 1998). ... Figure 15 compares the clustering performances... on other MNIST-like datasets including the Fashion-MNIST (Xiao et al., 2017), Kuzushiji-MNIST (Clanuwat et al., 2018), and Kannada-MNIST (Prabhu, 2019) datasets. Then, Figure 16 compares the performances on the representations of the Image Net dataset (Deng et al., 2009).
Dataset Splits	No	The paper discusses clustering performance and misclassification rates, but does not explicitly provide specific dataset split information (e.g., percentages, sample counts for training/validation/test sets) for reproduction.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	Figure 3: ...p = 512, n = 256, µ N(0, Ip), v = [ 1n/2; 1n/2] on Gaussian data. The results are obtained by averaging over 250 runs. ... Figure 7: ...with n = 2 048 and performance of the linear function in black. Results averaged over 100 runs.