reproducibilityindex.ai

Stochastic Variance Reduction for Nonconvex Optimization

Authors: Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabas Poczos, Alex Smola

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present our empirical results in this section. In particular, we study multiclass classiﬁcation using neural networks. This is typical nonconvex problem encountered in machine learning. Experimental Setup. We train neural networks with one fully-connected hidden layer of 100 nodes and 10 softmax output nodes. We use 2-regularization for training. We use CIFAR-10, MNIST, and STL-10 datasets for our experiments. Figure 1 shows the results.
Researcher Affiliation	Academia	Machine Learning Department, School of Computer Science, Carnegie Mellon University Laboratory for Information & Decision Systems, Massachusetts Institute of Technology
Pseudocode	Yes	Algorithm 1 SVRG and Algorithm 2 GD-SVRG
Open Source Code	No	The paper does not contain any explicit statement about releasing the source code or a link to a code repository for the described methodology.
Open Datasets	Yes	We use CIFAR-10, MNIST, and STL-10 datasets for our experiments. These datasets are standard in the neural networks literature. The features in the datasets are normalized to the interval [0, 1]. All the datasets come with a predeﬁned split into training and test datasets.
Dataset Splits	No	All the datasets come with a predeﬁned split into training and test datasets.
Hardware Specification	No	No specific hardware details (like GPU/CPU models, memory, or cloud instances) used for running the experiments are mentioned.
Software Dependencies	No	No specific software dependencies with version numbers are mentioned in the paper.
Experiment Setup	Yes	We train neural networks with one fully-connected hidden layer of 100 nodes and 10 softmax output nodes. We use 2-regularization for training. The 2 regularization is 1e-3 for CIFAR-10 and MNIST, and 1e-2 for STL-10. The step size is critical for SGD; we set it using the popular t-inverse schedule t = 0(1+ 0bt/nc) 1, where 0 and 0 are chosen so that SGD gives the best performance on the training loss. In our experiments, we also use 0 = 0; this results in a ﬁxed step size for SGD. For SVRG, we use a ﬁxed step size as suggested by our analysis. Again, the step size is chosen so that SVRG gives the best performance on the training loss. Initialization & mini-batching. Initialization is critical to training of neural networks. We use the normalized initialization in (Glorot & Bengio, 2010)... We use mini-batches of size 10 in our experiments... we use an epoch size m = n/10 in our experiments.