Learning Gaussian Mixtures with Generalized Linear Models: Precise Asymptotics in High-dimensions
Authors: Bruno Loureiro, Gabriele Sicuro, Cedric Gerbelot, Alessandro Pacco, Florent Krzakala, Lenka Zdeborová
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We exemplify our result in two tasks of interest in statistical learning: a) classification for a mixture with sparse means, where we study the efficiency of ℓ1 penalty with respect to ℓ2; b) max-margin multiclass classification, where we characterise the phase transition on the existence of the multi-class logistic maximum likelihood estimator for K > 2. Finally, we discuss how our theory can be applied beyond the scope of synthetic data, showing that in different cases Gaussian mixtures capture closely the learning curve of classification tasks in real data sets. Figure 2: Learning curves... Theoretical prediction (full lines) are compared with the results of numerical experiments (dots)... |
| Researcher Affiliation | Academia | Bruno Loureiro Ide PHICS, EPFL, Lausanne Gabriele Sicuro Department of Mathematics King s College London Cédric Gerbelot Laboratoire de Physique de l École Normale Supérieure Alessandro Pacco Ide PHICS, EPFL, Lausanne Florent Krzakala Ide PHICS, EPFL, Lausanne Lenka Zdeborová SPOC, EPFL, Lausanne |
| Pseudocode | No | The paper describes algorithms and derivations in narrative text and mathematical formulas but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | A repository with a polished version of the code we used to solve the equations is available on Git Hub [58] (see also Appendix B.5). |
| Open Datasets | Yes | MNIST [61] and Fashion-MNIST [62] |
| Dataset Splits | No | No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing was provided, beyond mentioning 'dividing the database into two balanced classes' and selecting 'n < ntot elements to perform the training, leaving the others for the test of the performances'. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions 'the Elastic Net module in the Scikit-learn package [60]' and 'the Logistic Regression module in the Scikit-learn package [60]', but it does not specify version numbers for Scikit-learn or any other software dependencies. |
| Experiment Setup | Yes | We estimate the dependence of the generalisation error ϵg on the sample complexity α and on the regularisation λ. We assume Gaussian means µk N(0, Id/d) and diagonal covariances Σk Σ = Id. Finally, we adopt a ridge penalty, r(W ) W 2 F/2, and we focus on the case of balanced clusters, i.e., ρk = 1/K for the sake of simplicity. In all presented cases, a quadratic regularisation has been adopted. Numerical experiments have been performed using d = 10^3. In Fig. 5 we also plot, as reference, the results of a classification task performed on synthetic data, obtained generating a genuine Gaussian mixture with the means and covariances of the real data set. We adopted a logistic loss with ℓ2 regularisation. generalisation error and training loss for the binary classification using the logistic loss on MNIST with λ = 0.05 (left) and on Fashion-MNIST with λ = 1 (right). |