Two-way kernel matrix puncturing: towards resource-efficient PCA and spectral clustering
Authors: Romain Couillet, Florent Chatelain, Nicolas Le Bihan
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We notably prove, and empirically confirm on various real image databases, that it is possible to drastically puncture the data, thereby providing possibly huge computational and storage gains, for a virtually constant (clustering or PCA) performance. This preliminary study opens as such the path towards rethinking, from a large dimensional standpoint, computational and storage costs in elementary machine learning models. 4. simulations on Fashion-MNIST and Big GAN generated images qualitatively (and partially quantitatively) confirm our theoretical findings, justifying the possibility to drastically reduce computational cost with virtually no impairment on classification performance. |
| Researcher Affiliation | Academia | 1GIPSA-lab, CNRS, Grenoble-INP, University Grenoble-Alps 2Centrale Sup elec, University Paris Saclay. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | All codes to reproduce our figures are available in the gitlab repository https://gricad-gitlab.univ-grenoble-alpes.fr/ chatelaf/two-way-kernel-matrix-puncturing. |
| Open Datasets | Yes | simulations on Fashion-MNIST and Big GAN generated images. The paper cites Brock et al., 2018 for Big GAN images. |
| Dataset Splits | No | The paper mentions synthetic data with specific `n` and `p` values, and real-world image datasets (Fashion-MNIST, Big GAN) with varying `n` values, but does not provide explicit training, validation, or test splits (e.g., percentages or counts). |
| Hardware Specification | No | The paper does not provide any specific hardware details used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | Two puncturing approaches are compared: (i) reducing the cost of the inner products x T i xj using a 5-fold (εS = .2 while εB = 1) random puncturing of the data vectors xi, versus (ii) a 25-fold puncturing of the matrix 1 p XTX (εB = .04 while εS = 1). The reported scenario is interesting in that we purposely took εBε2 Sc 1 0 constant in both cases; as such, while the matrices K and their spectra dramatically differ, eigenvector ˆv2 is essentially the same in both matrices. For synthetic data, it notes 'n = 4 000 and p = 2 000', with 'xi .4N(µ1, Ip) + .6N(µ2, Ip) for [µT 1, µT 2]T N(0, 1 p[ 20 12 12 30 ] Ip)' and 'c0 = 1 2, n = 4 000, p = 2 000, b = 0'. For real-world images, it states 'p = 4 096-VGG features of randomly Big GAN-generated images' and 'p = 784-dimensional real word images from the Fashion-MNIST dataset'. |