Asymptotics of feature learning in two-layer networks after one gradient-step

Authors: Hugo Cui, Luca Pesce, Yatin Dandi, Florent Krzakala, Yue Lu, Lenka Zdeborova, Bruno Loureiro

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide compelling numerical support that this theoretical characterization captures very closely the learning curves of two-layer networks trained following the protocol (3), and that s RF thus provide a valuable analytical playground...
Researcher Affiliation Academia 1 Statistical Physics Of Computation lab., Ecole Polytechnique F ed erale de Lausanne (EPFL), 1015 Lausanne, Switzerland 2Information Learning & Physics lab., Ecole Polytechnique F ed erale de Lausanne (EPFL), 1015 Lausanne, Switzerland 3Harvard University, School of Engineering and Applied Sciences 4D epartement d Informatique, Ecole Normale Sup erieure (ENS) PSL & CNRS, F-75230 Paris cedex 05, France.
Pseudocode No The paper describes mathematical models and derivations but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes The code used in the present work is available here.
Open Datasets No The paper uses a synthetic dataset generated from 'isotropic Gaussian covariates: f (x) = σ (θ x/ d), x N(0, Id)' but does not provide access information (link, DOI, or citation to a public repository) for a publicly available dataset.
Dataset Splits No The paper mentions training on subsets D0 and D1 with fixed ratios α0=n0/d and α=n1/d, but does not explicitly state or describe a distinct validation dataset split for hyperparameter tuning.
Hardware Specification No The paper mentions running simulations in various dimensions (e.g., 'd = p = 2000'), but no specific hardware details such as GPU/CPU models, processors, or memory are provided.
Software Dependencies No The paper mentions using activation functions like 'σ = tanh' and refers to some statistical physics methods, but it does not list specific software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9').
Experiment Setup Yes simulations were run in dimensions d = p = 2000, for a learning rate η = 2.5p, and a readout regularization λ = 0.01. The readout was trained with n1 = 2d samples. (Figure 1 caption)