Surprising Instabilities in Training Deep Networks and a Theoretical Analysis
Authors: Yuxin Sun, DONG LAO, Ganesh Sundaramoorthi, Anthony Yezzi
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We discover and provide empirical evidence of restrained instabilities in current deep learning training practice. To this end, we show that optimization paths in current practice of training convolutional neural networks (CNNs) can diverge significantly due to the smallest errors from finite precision arithmetic. We show that the divergence can be eliminated with learning rate choice. |
| Researcher Affiliation | Collaboration | Yuxin Sun1 Dong Lao2 Ganesh Sundaramoorthi3 Anthony Yezzi1 1Georgia Institute of Technology, 2UCLA, 3Raytheon Technologies |
| Pseudocode | No | The paper describes mathematical models and updates, but does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | Our first experiment uses the Res Net-56 architecture [38], which we train on CIFAR-10 [39] using perturbed SGD... we repeated the same experiment for a different network (VGG16 [41]) and a different dataset (Fashion-MNIST [42]). |
| Dataset Splits | No | The paper mentions training and testing but does not provide specific validation dataset splits needed to reproduce the experiment. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions "Pytorch" but does not provide specific version numbers for software dependencies needed to replicate the experiment. |
| Experiment Setup | Yes | We use the standard parameters for training this network [38]: lr=0.1, batch size = 128, weight decay = 5e-4, momentum=0.9. The standard step decay learning rate schedule is used: the learning rate is divided by 10 every 40 epochs for a total of 200 epochs. |