reproducibilityindex.ai

Two-way kernel matrix puncturing: towards resource-efficient PCA and spectral clustering

Authors: Romain Couillet, Florent Chatelain, Nicolas Le Bihan

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We notably prove, and empirically conﬁrm on various real image databases, that it is possible to drastically puncture the data, thereby providing possibly huge computational and storage gains, for a virtually constant (clustering or PCA) performance. This preliminary study opens as such the path towards rethinking, from a large dimensional standpoint, computational and storage costs in elementary machine learning models. 4. simulations on Fashion-MNIST and Big GAN generated images qualitatively (and partially quantitatively) conﬁrm our theoretical ﬁndings, justifying the possibility to drastically reduce computational cost with virtually no impairment on classiﬁcation performance.
Researcher Affiliation	Academia	1GIPSA-lab, CNRS, Grenoble-INP, University Grenoble-Alps 2Centrale Sup elec, University Paris Saclay.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	All codes to reproduce our ﬁgures are available in the gitlab repository https://gricad-gitlab.univ-grenoble-alpes.fr/ chatelaf/two-way-kernel-matrix-puncturing.
Open Datasets	Yes	simulations on Fashion-MNIST and Big GAN generated images. The paper cites Brock et al., 2018 for Big GAN images.
Dataset Splits	No	The paper mentions synthetic data with specific `n` and `p` values, and real-world image datasets (Fashion-MNIST, Big GAN) with varying `n` values, but does not provide explicit training, validation, or test splits (e.g., percentages or counts).
Hardware Specification	No	The paper does not provide any specific hardware details used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	Yes	Two puncturing approaches are compared: (i) reducing the cost of the inner products x T i xj using a 5-fold (εS = .2 while εB = 1) random puncturing of the data vectors xi, versus (ii) a 25-fold puncturing of the matrix 1 p XTX (εB = .04 while εS = 1). The reported scenario is interesting in that we purposely took εBε2 Sc 1 0 constant in both cases; as such, while the matrices K and their spectra dramatically differ, eigenvector ˆv2 is essentially the same in both matrices. For synthetic data, it notes 'n = 4 000 and p = 2 000', with 'xi .4N(µ1, Ip) + .6N(µ2, Ip) for [µT 1, µT 2]T N(0, 1 p[ 20 12 12 30 ] Ip)' and 'c0 = 1 2, n = 4 000, p = 2 000, b = 0'. For real-world images, it states 'p = 4 096-VGG features of randomly Big GAN-generated images' and 'p = 784-dimensional real word images from the Fashion-MNIST dataset'.