On the Ineffectiveness of Variance Reduced Optimization for Deep Learning

Authors: Aaron Defazio, Leon Bottou

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work we study the behavior of variance reduction methods on a prototypical non-convex problem in machine learning: A deep convolutional neural network designed for image classification. We discuss in Section 2 how standard training and modeling techniques significantly complicate the application of variance reduction methods in practice, and how to overcome some of these issues. In Sections 3 & 5 we study empirically the amount of variance reduction seen in practice on modern CNN architectures, and we quantify the properties of the network that affect the amount of variance reduction. In Sections 6 & 8 we show that streaming variants of SVRG do not improve over regular SVRG despite their theoretical ability to handle data augmentation.
Researcher Affiliation Industry Aaron Defazio & L eon Bottou Facebook AI Research New York
Pseudocode No The paper describes algorithmic steps in prose (e.g., the SVRG step in Section 1) but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Code to reproduce the experiments performed is provided on the first author s website.
Open Datasets Yes We study the behavior of variance reduction methods on a prototypical non-convex problem in machine learning: A deep convolutional neural network designed for image classification.
Dataset Splits No The paper mentions 'CIFAR10 test problem' and 'Image Net' but does not specify explicit training, validation, and test dataset splits (e.g., percentages or sample counts for each partition) or a specific cross-validation setup.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU models like NVIDIA A100, CPU models, or memory specifications) used for running its experiments. It only implies computations were done using standard frameworks.
Software Dependencies No The paper mentions 'Py Torch' and 'Tensor Flow, Abadi et al. [2015]' as standard libraries used but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes A batch size of 128 with momentum 0.9 and weight decay 0.0001 was used for all methods. Without-replacement data sampling was used.