On the Spectrum of Random Features Maps of High Dimensional Data

Authors: Zhenyu Liao, Romain Couillet

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We complete this article by showing that our theoretical results, derived from Gaussian mixture models, show an unexpected close match in practice when applied to some real-world datasets. We consider two different types of classification tasks: one on handwritten digits of the popular MNIST (Le Cun et al., 1998) database (number 6 and 8), and the other on epileptic EEG time series data (Andrzejak et al., 2001) (set B and E).
Researcher Affiliation Academia 1Laboratoire des Signaux et Syst emes (L2S), Centrale Sup elec, Universit e Paris-Saclay, France; 2G-STATS Data Science Chair, GIPSA-lab, University Grenobles-Alpes, France. Correspondence to: Zhenyu Liao <zhenyu.liao@l2s.centralesupelec.fr>, Romain Couillet <romain.couillet@centralesupelec.fr>.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Python 3 codes to reproduce the results in this section are available at https://github.com/Zhenyu-LIAO/RMT4RFM.
Open Datasets Yes We consider two different types of classification tasks: one on handwritten digits of the popular MNIST (Le Cun et al., 1998) database (number 6 and 8), and the other on epileptic EEG time series data (Andrzejak et al., 2001) (set B and E).
Dataset Splits No The paper mentions 'randomly selected vectorized images' and 'randomly picked EEG segments' for constructing the Gram matrix and performing spectral clustering, but it does not provide specific details on train/validation/test dataset splits for model training.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions 'Python 3 codes' but does not provide specific version numbers for Python itself or any other key software libraries or dependencies used in the experiments.
Experiment Setup No The paper discusses the dimensions of the data (p, T) and the number of random features (n) used in the analysis, and states expectations are 'estimated by averaging over 500 realizations of W' and accuracies are 'averaged over 50 runs', but it does not provide specific experimental setup details such as hyperparameters for model training (e.g., learning rates, batch sizes, or optimizer settings) or detailed configuration for the k-means algorithm.