Nondeterminism and Instability in Neural Network Optimization

Authors: Cecilia Summers, Michael J. Dinneen

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we establish an experimental protocol for understanding the effect of optimization nondeterminism on model diversity, allowing us to isolate the effects of a variety of sources of nondeterminism. Surprisingly, we find that all sources of nondeterminism have similar effects on measures of model diversity. To explain this intriguing fact, we identify the instability of model training, taken as an end-to-end procedure, as the key determinant. We show that even onebit changes in initial parameters result in models converging to vastly different values. Last, we propose two approaches for reducing the effects of instability on run-to-run variability. ... We show the results of our protocol in this setting in Table 1.
Researcher Affiliation Academia Cecilia Summers 1 Michael J. Dinneen 1 1Department of Computer Science, University of Auckland, Auckland, New Zealand. Correspondence to: Cecilia Summers <cecilia.summers.07@gmail.com>.
Pseudocode No The paper does not contain any pseudocode or algorithm blocks. Methods are described in prose.
Open Source Code Yes Code has been made publicly available.1 1https://github.com/ceciliaresearch/nondeterminism_instability
Open Datasets Yes We begin our study of nondeterminism with the fundamental task of image classification. We execute our protocol with CIFAR-10 (Krizhevsky et al., 2009) as a testbed, a 10-way classification dataset with 50,000 training images of resolution 32 32 pixels and 10,000 images for testing. ... For these experiments, we employ a small quasi-recurrent neural network (QRNN) (Bradbury et al., 2016) on Penn Treebank (Marcus et al., 1993)... Experiments on MNIST (Le Cun et al., 1998)... We perform larger-scale tests on Image Net using 20 runs of a Res Net-18 (He et al., 2016)...
Dataset Splits Yes CIFAR-10 (Krizhevsky et al., 2009) as a testbed, a 10-way classification dataset with 50,000 training images of resolution 32 32 pixels and 10,000 images for testing. ... Image Net validation set.
Hardware Specification Yes All experiments were done on two NVIDIA Tesla V100 GPUs with pytorch (Paszke et al., 2019).
Software Dependencies No The paper mentions 'pytorch (Paszke et al., 2019)' but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup Yes In these initial experiments, we use a 14-layer Res Net model (He et al., 2016), trained with a cosine learning rate decay (Loshchilov & Hutter, 2016) for 500 epochs with a maximum learning rate of .40, three epochs of linear learning rate warmup, a batch size of 512, momentum of 0.9, and weight decay of 5 10 4, obtaining a baseline accuracy of 90.0%. Data augmentation consists of random crops and horizontal flips.