Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks
Authors: Rodrigo Veiga, Ludovic Stephan, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Dots represent simulations, while solid lines are obtained by integration of the ODEs given by Eqs. (18). |
| Researcher Affiliation | Academia | Rodrigo Veiga Ide PHICS, EPFL, Lausanne IFUSP, USP, São Paulo Ludovic Stephan Ide PHICS, EPFL, Lausanne Bruno Loureiro Ide PHICS, EPFL, Lausanne Florent Krzakala Ide PHICS, EPFL, Lausanne Lenka Zdeborová SPOC, EPFL, Lausanne |
| Pseudocode | No | The paper describes equations and dynamics but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code: https://github.com/rodsveiga/phdiag_sgd |
| Open Datasets | No | The paper uses synthetic 'Gaussian data P(x) = N(x|0, 1)' but does not provide concrete access information (link, DOI, repository, or formal citation) for a publicly available dataset. |
| Dataset Splits | No | The paper mentions 'The data set is composed of n pairs (xν, yν)ν [n] Rd+1 identically and independently sampled from P(x, y)' but does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined split citations). |
| Hardware Specification | No | The paper does not provide specific hardware details (such as exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | Specifically, consider the following learning rate and hidden layer width scaling with d: γ = γ0/dδ, p = p0dκ... Teacher weights are such that ρrs = δrs. The initial student weights are chosen such that the dimension d can be varied without changing the initial conditions Q0, M 0, P... Henceforth, we take σ(x)=erf(x/sqrt(2))... Noise: = 10^-3. Activation function: σ(x) = erf(x/sqrt(2)). Data distribution: P(x) = N(x|0, 1). |