Stochastic Variance Reduction for Nonconvex Optimization
Authors: Sashank J. Reddi, Ahmed Hefny, Suvrit Sra, Barnabas Poczos, Alex Smola
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present our empirical results in this section. In particular, we study multiclass classification using neural networks. This is typical nonconvex problem encountered in machine learning. Experimental Setup. We train neural networks with one fully-connected hidden layer of 100 nodes and 10 softmax output nodes. We use 2-regularization for training. We use CIFAR-10, MNIST, and STL-10 datasets for our experiments. Figure 1 shows the results. |
| Researcher Affiliation | Academia | Machine Learning Department, School of Computer Science, Carnegie Mellon University Laboratory for Information & Decision Systems, Massachusetts Institute of Technology |
| Pseudocode | Yes | Algorithm 1 SVRG and Algorithm 2 GD-SVRG |
| Open Source Code | No | The paper does not contain any explicit statement about releasing the source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We use CIFAR-10, MNIST, and STL-10 datasets for our experiments. These datasets are standard in the neural networks literature. The features in the datasets are normalized to the interval [0, 1]. All the datasets come with a predefined split into training and test datasets. |
| Dataset Splits | No | All the datasets come with a predefined split into training and test datasets. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models, memory, or cloud instances) used for running the experiments are mentioned. |
| Software Dependencies | No | No specific software dependencies with version numbers are mentioned in the paper. |
| Experiment Setup | Yes | We train neural networks with one fully-connected hidden layer of 100 nodes and 10 softmax output nodes. We use 2-regularization for training. The 2 regularization is 1e-3 for CIFAR-10 and MNIST, and 1e-2 for STL-10. The step size is critical for SGD; we set it using the popular t-inverse schedule t = 0(1+ 0bt/nc) 1, where 0 and 0 are chosen so that SGD gives the best performance on the training loss. In our experiments, we also use 0 = 0; this results in a fixed step size for SGD. For SVRG, we use a fixed step size as suggested by our analysis. Again, the step size is chosen so that SVRG gives the best performance on the training loss. Initialization & mini-batching. Initialization is critical to training of neural networks. We use the normalized initialization in (Glorot & Bengio, 2010)... We use mini-batches of size 10 in our experiments... we use an epoch size m = n/10 in our experiments. |