Learning Gaussian Mixtures with Generalized Linear Models: Precise Asymptotics in High-dimensions

Authors: Bruno Loureiro, Gabriele Sicuro, Cedric Gerbelot, Alessandro Pacco, Florent Krzakala, Lenka Zdeborová

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We exemplify our result in two tasks of interest in statistical learning: a) classification for a mixture with sparse means, where we study the efficiency of ℓ1 penalty with respect to ℓ2; b) max-margin multiclass classification, where we characterise the phase transition on the existence of the multi-class logistic maximum likelihood estimator for K > 2. Finally, we discuss how our theory can be applied beyond the scope of synthetic data, showing that in different cases Gaussian mixtures capture closely the learning curve of classification tasks in real data sets. Figure 2: Learning curves... Theoretical prediction (full lines) are compared with the results of numerical experiments (dots)...
Researcher Affiliation Academia Bruno Loureiro Ide PHICS, EPFL, Lausanne Gabriele Sicuro Department of Mathematics King s College London Cédric Gerbelot Laboratoire de Physique de l École Normale Supérieure Alessandro Pacco Ide PHICS, EPFL, Lausanne Florent Krzakala Ide PHICS, EPFL, Lausanne Lenka Zdeborová SPOC, EPFL, Lausanne
Pseudocode No The paper describes algorithms and derivations in narrative text and mathematical formulas but does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes A repository with a polished version of the code we used to solve the equations is available on Git Hub [58] (see also Appendix B.5).
Open Datasets Yes MNIST [61] and Fashion-MNIST [62]
Dataset Splits No No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing was provided, beyond mentioning 'dividing the database into two balanced classes' and selecting 'n < ntot elements to perform the training, leaving the others for the test of the performances'.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper mentions 'the Elastic Net module in the Scikit-learn package [60]' and 'the Logistic Regression module in the Scikit-learn package [60]', but it does not specify version numbers for Scikit-learn or any other software dependencies.
Experiment Setup Yes We estimate the dependence of the generalisation error ϵg on the sample complexity α and on the regularisation λ. We assume Gaussian means µk N(0, Id/d) and diagonal covariances Σk Σ = Id. Finally, we adopt a ridge penalty, r(W ) W 2 F/2, and we focus on the case of balanced clusters, i.e., ρk = 1/K for the sake of simplicity. In all presented cases, a quadratic regularisation has been adopted. Numerical experiments have been performed using d = 10^3. In Fig. 5 we also plot, as reference, the results of a classification task performed on synthetic data, obtained generating a genuine Gaussian mixture with the means and covariances of the real data set. We adopted a logistic loss with ℓ2 regularisation. generalisation error and training loss for the binary classification using the logistic loss on MNIST with λ = 0.05 (left) and on Fashion-MNIST with λ = 1 (right).