Asymptotics of feature learning in two-layer networks after one gradient-step
Authors: Hugo Cui, Luca Pesce, Yatin Dandi, Florent Krzakala, Yue Lu, Lenka Zdeborova, Bruno Loureiro
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide compelling numerical support that this theoretical characterization captures very closely the learning curves of two-layer networks trained following the protocol (3), and that s RF thus provide a valuable analytical playground... |
| Researcher Affiliation | Academia | 1 Statistical Physics Of Computation lab., Ecole Polytechnique F ed erale de Lausanne (EPFL), 1015 Lausanne, Switzerland 2Information Learning & Physics lab., Ecole Polytechnique F ed erale de Lausanne (EPFL), 1015 Lausanne, Switzerland 3Harvard University, School of Engineering and Applied Sciences 4D epartement d Informatique, Ecole Normale Sup erieure (ENS) PSL & CNRS, F-75230 Paris cedex 05, France. |
| Pseudocode | No | The paper describes mathematical models and derivations but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code used in the present work is available here. |
| Open Datasets | No | The paper uses a synthetic dataset generated from 'isotropic Gaussian covariates: f (x) = σ (θ x/ d), x N(0, Id)' but does not provide access information (link, DOI, or citation to a public repository) for a publicly available dataset. |
| Dataset Splits | No | The paper mentions training on subsets D0 and D1 with fixed ratios α0=n0/d and α=n1/d, but does not explicitly state or describe a distinct validation dataset split for hyperparameter tuning. |
| Hardware Specification | No | The paper mentions running simulations in various dimensions (e.g., 'd = p = 2000'), but no specific hardware details such as GPU/CPU models, processors, or memory are provided. |
| Software Dependencies | No | The paper mentions using activation functions like 'σ = tanh' and refers to some statistical physics methods, but it does not list specific software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9'). |
| Experiment Setup | Yes | simulations were run in dimensions d = p = 2000, for a learning rate η = 2.5p, and a readout regularization λ = 0.01. The readout was trained with n1 = 2d samples. (Figure 1 caption) |