reproducibilityindex.ai

Generalization Error of Generalized Linear Models in High Dimensions

Authors: Melikasadat Emami, Mojtaba Sahraee-Ardakan, Parthe Pandit, Sundeep Rangan, Alyson Fletcher

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our theoretical results on a number of synthetic data experiments. For all the experiments, the training and test data is generated following the model in Section 2. We generate the training and test eigenvalues as i.i.d. with lognormal distributions. In all three cases in Fig. 2, the SE theory exactly matches the simulated values for the test MSE.
Researcher Affiliation	Academia	1Department of Electrical and Computer Engineering, University of California, Los Angeles, Los Angeles, USA 2Department of Statistics, University of California, Los Angeles, Los Angeles, USA 3Department of Electrical and Computer Engineering, New York University, Brooklyn, New York, USA.
Pseudocode	Yes	Algorithm 1 ML-VAMP GLM Learning Algorithm
Open Source Code	Yes	Code available at: https://github.com/ melikaemami/Generalization-Error-of-GLMs
Open Datasets	No	For all the experiments, the training and test data is generated following the model in Section 2. We generate the training and test eigenvalues as i.i.d. with lognormal distributions. The paper describes generating synthetic data and does not provide access information for a public or open dataset.
Dataset Splits	No	The paper mentions generating training and test data, but does not specify explicit train/validation/test dataset splits (e.g., percentages or counts for a fixed dataset), as data is generated on-the-fly for each experiment instance.
Hardware Specification	No	The paper mentions using Python's 'sklearn package' and 'Tensorﬂow' with the 'ADAM optimizer', but no specific hardware components (e.g., GPU/CPU models, memory) are detailed.
Software Dependencies	No	The paper mentions using the 'sklearn package' and 'Tensorﬂow' but does not specify their version numbers, which are necessary for reproducible software dependencies.
Experiment Setup	Yes	The noise variance σ2 d is set for an SNR level of 10 d B. We use a standard mean-square error (MSE) output loss, fout(y, p) = (y p)2/(2σ2 d). We take λ = (10) 4/E(w0 j)2. We use a logistic output P(y = 1) = 1/(1 + e p), a binary cross entropy output loss fout(y, p), and ℓ2-regularization level λ. We scale the data matrix so that the input E(p2) = 9. We also take σ2 d = 0.01. We use the ADAM optimizer (Kingma & Ba, 2014) with 200 epochs.