reproducibilityindex.ai

Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

Authors: Rodrigo Veiga, Ludovic Stephan, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Dots represent simulations, while solid lines are obtained by integration of the ODEs given by Eqs. (18).
Researcher Affiliation	Academia	Rodrigo Veiga Ide PHICS, EPFL, Lausanne IFUSP, USP, São Paulo Ludovic Stephan Ide PHICS, EPFL, Lausanne Bruno Loureiro Ide PHICS, EPFL, Lausanne Florent Krzakala Ide PHICS, EPFL, Lausanne Lenka Zdeborová SPOC, EPFL, Lausanne
Pseudocode	No	The paper describes equations and dynamics but does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code: https://github.com/rodsveiga/phdiag_sgd
Open Datasets	No	The paper uses synthetic 'Gaussian data P(x) = N(x\|0, 1)' but does not provide concrete access information (link, DOI, repository, or formal citation) for a publicly available dataset.
Dataset Splits	No	The paper mentions 'The data set is composed of n pairs (xν, yν)ν [n] Rd+1 identically and independently sampled from P(x, y)' but does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined split citations).
Hardware Specification	No	The paper does not provide specific hardware details (such as exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup	Yes	Specifically, consider the following learning rate and hidden layer width scaling with d: γ = γ0/dδ, p = p0dκ... Teacher weights are such that ρrs = δrs. The initial student weights are chosen such that the dimension d can be varied without changing the initial conditions Q0, M 0, P... Henceforth, we take σ(x)=erf(x/sqrt(2))... Noise: = 10^-3. Activation function: σ(x) = erf(x/sqrt(2)). Data distribution: P(x) = N(x\|0, 1).