reproducibilityindex.ai

Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint

Authors: Jimmy Ba, Murat Erdogdu, Taiji Suzuki, Denny Wu, Tianzong Zhang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical and experimental results suggest that previously studied model setups that provably give rise to double descent might not translate to optimizing two-layer neural networks.F EXPERIMENT SETUP
Researcher Affiliation	Collaboration	University of Toronto1, Vector Institute2, University of Tokyo3, RIKEN AIP4, Tsinghua University5
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets	No	The paper uses synthetically generated data based on a student-teacher setup and Gaussian features, not a publicly available or open dataset.
Dataset Splits	No	The paper discusses 'training samples' but does not provide specific details on how data is split into training, validation, and test sets.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies	No	The paper does not mention any specific software dependencies or their version numbers.
Experiment Setup	Yes	Optimizing the Second Layer. We compute the minimum-norm solution by directly solving the pseudo-inverse. We set n = 1000 and vary γ1, γ2 from 0.1 to 3. The linear teacher model F(x) = x β is ﬁxed as β = 1d/ d. For each (γ1, γ2) we average across 50 random draws of data. Optimizing the First Layer. For both initializations, we use gradient descent with small step size (η = 0.1) and train the model for minimally 25000 steps and till W f(X, W) 2 F < 10 6. We ﬁx n = 320 and vary γ1, γ2 from 0.1 to 3 with the same linear teacher model β = 1d/ d. The risk is averaged across 20 models trained from different initializations.