On the Ineffectiveness of Variance Reduced Optimization for Deep Learning
Authors: Aaron Defazio, Leon Bottou
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work we study the behavior of variance reduction methods on a prototypical non-convex problem in machine learning: A deep convolutional neural network designed for image classification. We discuss in Section 2 how standard training and modeling techniques significantly complicate the application of variance reduction methods in practice, and how to overcome some of these issues. In Sections 3 & 5 we study empirically the amount of variance reduction seen in practice on modern CNN architectures, and we quantify the properties of the network that affect the amount of variance reduction. In Sections 6 & 8 we show that streaming variants of SVRG do not improve over regular SVRG despite their theoretical ability to handle data augmentation. |
| Researcher Affiliation | Industry | Aaron Defazio & L eon Bottou Facebook AI Research New York |
| Pseudocode | No | The paper describes algorithmic steps in prose (e.g., the SVRG step in Section 1) but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code to reproduce the experiments performed is provided on the first author s website. |
| Open Datasets | Yes | We study the behavior of variance reduction methods on a prototypical non-convex problem in machine learning: A deep convolutional neural network designed for image classification. |
| Dataset Splits | No | The paper mentions 'CIFAR10 test problem' and 'Image Net' but does not specify explicit training, validation, and test dataset splits (e.g., percentages or sample counts for each partition) or a specific cross-validation setup. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU models like NVIDIA A100, CPU models, or memory specifications) used for running its experiments. It only implies computations were done using standard frameworks. |
| Software Dependencies | No | The paper mentions 'Py Torch' and 'Tensor Flow, Abadi et al. [2015]' as standard libraries used but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | A batch size of 128 with momentum 0.9 and weight decay 0.0001 was used for all methods. Without-replacement data sampling was used. |