The Early Phase of Neural Network Training
Authors: Jonathan Frankle, David J. Schwab, Ari S. Morcos
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive measurements of the network state during these early iterations of training and leverage the framework of Frankle et al. (2019) to quantitatively probe the weight distribution and its reliance on various aspects of the dataset. |
| Researcher Affiliation | Collaboration | Jonathan Frankle MIT CSAIL David J. Schwab CUNY ITS Facebook AI Research Ari S. Morcos Facebook AI Research |
| Pseudocode | No | No pseudocode or clearly labeled algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not contain any explicit statement about providing open-source code for the methodology or a link to a code repository. |
| Open Datasets | Yes | Throughout this paper, we study five standard convolutional neural networks for CIFAR-10. |
| Dataset Splits | No | The paper mentions the use of CIFAR-10 and evaluation accuracy but does not explicitly provide specific training/validation/test dataset split percentages, absolute sample counts, or explicit references to predefined splits with citations for reproducibility within the text. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies, such as library names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | All networks follow the same training regime: we train with SGD for 160 epochs starting at learning rate 0.1 (momentum 0.9) and drop the learning rate by a factor of ten at epoch 80 and again at epoch 120. Training includes weight decay with weight 1e4. Data is augmented with normalization, random flips, and random crops up to four pixels in any direction. Batch Size: 128 (for ResNet, WRN), 64 (for VGG-13). |