Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks

Authors: Wei Hu, Lechao Xiao, Ben Adlam, Jeffrey Pennington

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental we formally prove that, for a class of well-behaved input distributions, the early-time learning dynamics of a two-layer fully-connected neural network can be mimicked by training a simple linear model on the inputs. We additionally argue that this surprising simplicity can persist in networks with more layers and with convolutional architecture, which we verify empirically.
Researcher Affiliation Collaboration Princeton University. Work partly performed at Google. Email: EMAIL Google Research, Brain Team. Email: EMAIL Google Research, Brain Team. Work done as a member of the Google AI Residency program (http: //g.co/brainresidency). Email: EMAIL Google Research, Brain Team. Email: EMAIL
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets Yes We perform experiments on a binary classification task from CIFAR-10 ( cats vs horses ) using a multi-layer FC network and a CNN.
Dataset Splits No The paper mentions "20,000 training samples and 2,000 test samples" for synthetic data and "10,000 training and 2,000 test data" for CIFAR-10, but does not explicitly provide information on a validation split.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup No The paper mentions network architecture details like activation function, width, and number of layers, but does not provide concrete hyperparameter values such as specific learning rates, batch sizes, or optimizer settings in the main text.