Stochastic Nested Variance Reduction for Nonconvex Optimization

Authors: Dongruo Zhou, Pan Xu, Quanquan Gu

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we compare our algorithm SNVRG with other baseline algorithms on training a convolutional neural network for image classification. We plotted the training loss and test error for different algorithms on each dataset in Figure 3.
Researcher Affiliation Academia Dongruo Zhou Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 drzhou@cs.ucla.edu Pan Xu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 panxu@cs.ucla.edu Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095 qgu@cs.ucla.edu
Pseudocode Yes Algorithm 1 One-epoch-SNVRG(x0, F, K, M, {Tl}, {Bl}, B), Algorithm 2 SNVRG, Algorithm 3 SNVRG-PL
Open Source Code No The paper mentions the implementation environment ('All algorithm are implemented in Pytorch platform version 0.4.0 within Python 3.6.4.') but does not provide any statement about releasing the source code or a link to a repository for the described methodology.
Open Datasets Yes We use three image datasets: (1) The MNIST dataset [42] consists of handwritten digits and has 50, 000 training examples and 10, 000 test examples. (2) CIFAR10 dataset [22] consists of images in 10 classes and has 50, 000 training examples and 10, 000 test examples. (3) SVHN dataset [33] consists of images of digits and has 531, 131 training examples and 26, 032 test examples.
Dataset Splits No The paper provides training and test set sizes for the datasets (e.g., '50,000 training examples and 10,000 test examples' for MNIST), but does not explicitly mention a validation split or details for reproducing such a split.
Hardware Specification Yes All experiments are conducted on Amazon AWS p2.xlarge servers which comes with Intel Xeon E5 CPU and NVIDIA Tesla K80 GPU (12G GPU RAM).
Software Dependencies Yes All algorithm are implemented in Pytorch platform version 0.4.0 within Python 3.6.4.
Experiment Setup Yes For SGD, we search the batch size from {256, 512, 1024, 2048} and the initial step sizes from {1, 0.1, 0.01}. Following the convention of deep learning practice, we apply learning rate decay schedule to each algorithm with the learning rate decayed by 0.1 every 20 epochs.