Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Surprising Instabilities in Training Deep Networks and a Theoretical Analysis

Authors: Yuxin Sun, DONG LAO, Ganesh Sundaramoorthi, Anthony Yezzi

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We discover and provide empirical evidence of restrained instabilities in current deep learning training practice. To this end, we show that optimization paths in current practice of training convolutional neural networks (CNNs) can diverge significantly due to the smallest errors from finite precision arithmetic. We show that the divergence can be eliminated with learning rate choice.
Researcher Affiliation Collaboration Yuxin Sun1 Dong Lao2 Ganesh Sundaramoorthi3 Anthony Yezzi1 1Georgia Institute of Technology, 2UCLA, 3Raytheon Technologies
Pseudocode No The paper describes mathematical models and updates, but does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes Our first experiment uses the Res Net-56 architecture [38], which we train on CIFAR-10 [39] using perturbed SGD... we repeated the same experiment for a different network (VGG16 [41]) and a different dataset (Fashion-MNIST [42]).
Dataset Splits No The paper mentions training and testing but does not provide specific validation dataset splits needed to reproduce the experiment.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions "Pytorch" but does not provide specific version numbers for software dependencies needed to replicate the experiment.
Experiment Setup Yes We use the standard parameters for training this network [38]: lr=0.1, batch size = 128, weight decay = 5e-4, momentum=0.9. The standard step decay learning rate schedule is used: the learning rate is divided by 10 every 40 epochs for a total of 200 epochs.