reproducibilityindex.ai

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks

Authors: Wei Hu, Lechao Xiao, Ben Adlam, Jeffrey Pennington

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	we formally prove that, for a class of well-behaved input distributions, the early-time learning dynamics of a two-layer fully-connected neural network can be mimicked by training a simple linear model on the inputs. We additionally argue that this surprising simplicity can persist in networks with more layers and with convolutional architecture, which we verify empirically.
Researcher Affiliation	Collaboration	Princeton University. Work partly performed at Google. Email: huwei@cs.princeton.edu Google Research, Brain Team. Email: xlc@google.com Google Research, Brain Team. Work done as a member of the Google AI Residency program (http: //g.co/brainresidency). Email: adlam@google.com Google Research, Brain Team. Email: jpennin@google.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets	Yes	We perform experiments on a binary classiﬁcation task from CIFAR-10 ( cats vs horses ) using a multi-layer FC network and a CNN.
Dataset Splits	No	The paper mentions "20,000 training samples and 2,000 test samples" for synthetic data and "10,000 training and 2,000 test data" for CIFAR-10, but does not explicitly provide information on a validation split.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	No	The paper mentions network architecture details like activation function, width, and number of layers, but does not provide concrete hyperparameter values such as specific learning rates, batch sizes, or optimizer settings in the main text.