reproducibilityindex.ai

Surprising Instabilities in Training Deep Networks and a Theoretical Analysis

Authors: Yuxin Sun, DONG LAO, Ganesh Sundaramoorthi, Anthony Yezzi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We discover and provide empirical evidence of restrained instabilities in current deep learning training practice. To this end, we show that optimization paths in current practice of training convolutional neural networks (CNNs) can diverge significantly due to the smallest errors from finite precision arithmetic. We show that the divergence can be eliminated with learning rate choice.
Researcher Affiliation	Collaboration	Yuxin Sun1 Dong Lao2 Ganesh Sundaramoorthi3 Anthony Yezzi1 1Georgia Institute of Technology, 2UCLA, 3Raytheon Technologies
Pseudocode	No	The paper describes mathematical models and updates, but does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	Our first experiment uses the Res Net-56 architecture [38], which we train on CIFAR-10 [39] using perturbed SGD... we repeated the same experiment for a different network (VGG16 [41]) and a different dataset (Fashion-MNIST [42]).
Dataset Splits	No	The paper mentions training and testing but does not provide specific validation dataset splits needed to reproduce the experiment.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions "Pytorch" but does not provide specific version numbers for software dependencies needed to replicate the experiment.
Experiment Setup	Yes	We use the standard parameters for training this network [38]: lr=0.1, batch size = 128, weight decay = 5e-4, momentum=0.9. The standard step decay learning rate schedule is used: the learning rate is divided by 10 every 40 epochs for a total of 200 epochs.