Surprising Instabilities in Training Deep Networks and a Theoretical Analysis

Authors: Yuxin Sun, DONG LAO, Ganesh Sundaramoorthi, Anthony Yezzi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We discover and provide empirical evidence of restrained instabilities in current deep learning training practice. To this end, we show that optimization paths in current practice of training convolutional neural networks (CNNs) can diverge significantly due to the smallest errors from finite precision arithmetic. We show that the divergence can be eliminated with learning rate choice.
Researcher Affiliation Collaboration Yuxin Sun1 Dong Lao2 Ganesh Sundaramoorthi3 Anthony Yezzi1 1Georgia Institute of Technology, 2UCLA, 3Raytheon Technologies
Pseudocode No The paper describes mathematical models and updates, but does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes Our first experiment uses the Res Net-56 architecture [38], which we train on CIFAR-10 [39] using perturbed SGD... we repeated the same experiment for a different network (VGG16 [41]) and a different dataset (Fashion-MNIST [42]).
Dataset Splits No The paper mentions training and testing but does not provide specific validation dataset splits needed to reproduce the experiment.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions "Pytorch" but does not provide specific version numbers for software dependencies needed to replicate the experiment.
Experiment Setup Yes We use the standard parameters for training this network [38]: lr=0.1, batch size = 128, weight decay = 5e-4, momentum=0.9. The standard step decay learning rate schedule is used: the learning rate is divided by 10 every 40 epochs for a total of 200 epochs.