The Early Phase of Neural Network Training

Authors: Jonathan Frankle, David J. Schwab, Ari S. Morcos

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive measurements of the network state during these early iterations of training and leverage the framework of Frankle et al. (2019) to quantitatively probe the weight distribution and its reliance on various aspects of the dataset.
Researcher Affiliation Collaboration Jonathan Frankle MIT CSAIL David J. Schwab CUNY ITS Facebook AI Research Ari S. Morcos Facebook AI Research
Pseudocode No No pseudocode or clearly labeled algorithm blocks were found in the paper.
Open Source Code No The paper does not contain any explicit statement about providing open-source code for the methodology or a link to a code repository.
Open Datasets Yes Throughout this paper, we study five standard convolutional neural networks for CIFAR-10.
Dataset Splits No The paper mentions the use of CIFAR-10 and evaluation accuracy but does not explicitly provide specific training/validation/test dataset split percentages, absolute sample counts, or explicit references to predefined splits with citations for reproducibility within the text.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies, such as library names with version numbers, needed to replicate the experiment.
Experiment Setup Yes All networks follow the same training regime: we train with SGD for 160 epochs starting at learning rate 0.1 (momentum 0.9) and drop the learning rate by a factor of ten at epoch 80 and again at epoch 120. Training includes weight decay with weight 1e4. Data is augmented with normalization, random flips, and random crops up to four pixels in any direction. Batch Size: 128 (for ResNet, WRN), 64 (for VGG-13).