Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

Authors: Rodrigo Veiga, Ludovic Stephan, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Dots represent simulations, while solid lines are obtained by integration of the ODEs given by Eqs. (18).
Researcher Affiliation Academia Rodrigo Veiga Ide PHICS, EPFL, Lausanne IFUSP, USP, São Paulo Ludovic Stephan Ide PHICS, EPFL, Lausanne Bruno Loureiro Ide PHICS, EPFL, Lausanne Florent Krzakala Ide PHICS, EPFL, Lausanne Lenka Zdeborová SPOC, EPFL, Lausanne
Pseudocode No The paper describes equations and dynamics but does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code: https://github.com/rodsveiga/phdiag_sgd
Open Datasets No The paper uses synthetic 'Gaussian data P(x) = N(x|0, 1)' but does not provide concrete access information (link, DOI, repository, or formal citation) for a publicly available dataset.
Dataset Splits No The paper mentions 'The data set is composed of n pairs (xν, yν)ν [n] Rd+1 identically and independently sampled from P(x, y)' but does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined split citations).
Hardware Specification No The paper does not provide specific hardware details (such as exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup Yes Specifically, consider the following learning rate and hidden layer width scaling with d: γ = γ0/dδ, p = p0dκ... Teacher weights are such that ρrs = δrs. The initial student weights are chosen such that the dimension d can be varied without changing the initial conditions Q0, M 0, P... Henceforth, we take σ(x)=erf(x/sqrt(2))... Noise: = 10^-3. Activation function: σ(x) = erf(x/sqrt(2)). Data distribution: P(x) = N(x|0, 1).