reproducibilityindex.ai

On the Spectrum of Random Features Maps of High Dimensional Data

Authors: Zhenyu Liao, Romain Couillet

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We complete this article by showing that our theoretical results, derived from Gaussian mixture models, show an unexpected close match in practice when applied to some real-world datasets. We consider two different types of classiﬁcation tasks: one on handwritten digits of the popular MNIST (Le Cun et al., 1998) database (number 6 and 8), and the other on epileptic EEG time series data (Andrzejak et al., 2001) (set B and E).
Researcher Affiliation	Academia	1Laboratoire des Signaux et Syst emes (L2S), Centrale Sup elec, Universit e Paris-Saclay, France; 2G-STATS Data Science Chair, GIPSA-lab, University Grenobles-Alpes, France. Correspondence to: Zhenyu Liao <zhenyu.liao@l2s.centralesupelec.fr>, Romain Couillet <romain.couillet@centralesupelec.fr>.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Python 3 codes to reproduce the results in this section are available at https://github.com/Zhenyu-LIAO/RMT4RFM.
Open Datasets	Yes	We consider two different types of classiﬁcation tasks: one on handwritten digits of the popular MNIST (Le Cun et al., 1998) database (number 6 and 8), and the other on epileptic EEG time series data (Andrzejak et al., 2001) (set B and E).
Dataset Splits	No	The paper mentions 'randomly selected vectorized images' and 'randomly picked EEG segments' for constructing the Gram matrix and performing spectral clustering, but it does not provide specific details on train/validation/test dataset splits for model training.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions 'Python 3 codes' but does not provide specific version numbers for Python itself or any other key software libraries or dependencies used in the experiments.
Experiment Setup	No	The paper discusses the dimensions of the data (p, T) and the number of random features (n) used in the analysis, and states expectations are 'estimated by averaging over 500 realizations of W' and accuracies are 'averaged over 50 runs', but it does not provide specific experimental setup details such as hyperparameters for model training (e.g., learning rates, batch sizes, or optimizer settings) or detailed configuration for the k-means algorithm.