Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint

Authors: Jimmy Ba, Murat Erdogdu, Taiji Suzuki, Denny Wu, Tianzong Zhang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theoretical and experimental results suggest that previously studied model setups that provably give rise to double descent might not translate to optimizing two-layer neural networks.F EXPERIMENT SETUP
Researcher Affiliation Collaboration University of Toronto1, Vector Institute2, University of Tokyo3, RIKEN AIP4, Tsinghua University5
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets No The paper uses synthetically generated data based on a student-teacher setup and Gaussian features, not a publicly available or open dataset.
Dataset Splits No The paper discusses 'training samples' but does not provide specific details on how data is split into training, validation, and test sets.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies No The paper does not mention any specific software dependencies or their version numbers.
Experiment Setup Yes Optimizing the Second Layer. We compute the minimum-norm solution by directly solving the pseudo-inverse. We set n = 1000 and vary γ1, γ2 from 0.1 to 3. The linear teacher model F(x) = x β is fixed as β = 1d/ d. For each (γ1, γ2) we average across 50 random draws of data. Optimizing the First Layer. For both initializations, we use gradient descent with small step size (η = 0.1) and train the model for minimally 25000 steps and till W f(X, W) 2 F < 10 6. We fix n = 320 and vary γ1, γ2 from 0.1 to 3 with the same linear teacher model β = 1d/ d. The risk is averaged across 20 models trained from different initializations.