reproducibilityindex.ai

On the Ineffectiveness of Variance Reduced Optimization for Deep Learning

Authors: Aaron Defazio, Leon Bottou

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work we study the behavior of variance reduction methods on a prototypical non-convex problem in machine learning: A deep convolutional neural network designed for image classiﬁcation. We discuss in Section 2 how standard training and modeling techniques signiﬁcantly complicate the application of variance reduction methods in practice, and how to overcome some of these issues. In Sections 3 & 5 we study empirically the amount of variance reduction seen in practice on modern CNN architectures, and we quantify the properties of the network that affect the amount of variance reduction. In Sections 6 & 8 we show that streaming variants of SVRG do not improve over regular SVRG despite their theoretical ability to handle data augmentation.
Researcher Affiliation	Industry	Aaron Defazio & L eon Bottou Facebook AI Research New York
Pseudocode	No	The paper describes algorithmic steps in prose (e.g., the SVRG step in Section 1) but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code to reproduce the experiments performed is provided on the ﬁrst author s website.
Open Datasets	Yes	We study the behavior of variance reduction methods on a prototypical non-convex problem in machine learning: A deep convolutional neural network designed for image classiﬁcation.
Dataset Splits	No	The paper mentions 'CIFAR10 test problem' and 'Image Net' but does not specify explicit training, validation, and test dataset splits (e.g., percentages or sample counts for each partition) or a specific cross-validation setup.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU models like NVIDIA A100, CPU models, or memory specifications) used for running its experiments. It only implies computations were done using standard frameworks.
Software Dependencies	No	The paper mentions 'Py Torch' and 'Tensor Flow, Abadi et al. [2015]' as standard libraries used but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	A batch size of 128 with momentum 0.9 and weight decay 0.0001 was used for all methods. Without-replacement data sampling was used.