Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks
Authors: Rodrigo Veiga, Ludovic Stephan, Bruno Loureiro, Florent Krzakala, Lenka Zdeborová
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Dots represent simulations, while solid lines are obtained by integration of the ODEs given by Eqs. (18). |
| Researcher Affiliation | Academia | Rodrigo Veiga Ide PHICS, EPFL, Lausanne IFUSP, USP, São Paulo Ludovic Stephan Ide PHICS, EPFL, Lausanne Bruno Loureiro Ide PHICS, EPFL, Lausanne Florent Krzakala Ide PHICS, EPFL, Lausanne Lenka Zdeborová SPOC, EPFL, Lausanne |
| Pseudocode | No | The paper describes equations and dynamics but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code: https://github.com/rodsveiga/phdiag_sgd |
| Open Datasets | No | The paper uses synthetic 'Gaussian data P(x) = N(x|0, 1)' but does not provide concrete access information (link, DOI, repository, or formal citation) for a publicly available dataset. |
| Dataset Splits | No | The paper mentions 'The data set is composed of n pairs (xν, yν)ν [n] Rd+1 identically and independently sampled from P(x, y)' but does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or predefined split citations). |
| Hardware Specification | No | The paper does not provide specific hardware details (such as exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | Specifically, consider the following learning rate and hidden layer width scaling with d: γ = γ0/dδ, p = p0dκ... Teacher weights are such that ρrs = δrs. The initial student weights are chosen such that the dimension d can be varied without changing the initial conditions Q0, M 0, P... Henceforth, we take σ(x)=erf(x/sqrt(2))... Noise: = 10^-3. Activation function: σ(x) = erf(x/sqrt(2)). Data distribution: P(x) = N(x|0, 1). |