Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization
Authors: Zhouyuan Huo, Heng Huang
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our methods by optimizing multi-layer neural networks on two real datasets (MNIST and CIFAR-10), and experimental results demonstrate our theoretical analysis. |
| Researcher Affiliation | Academia | Zhouyuan Huo Dept. of Computer Science and Engineering University of Texas at Arlington Arlington, TX, 76019, USA zhouyuan.huo@mavs.uta.edu Heng Huang Dept. of Computer Science and Engineering University of Texas at Arlington Arlington, TX, 76019, USA heng@uta.edu |
| Pseudocode | Yes | Algorithm 1 Shared-Asy SVRG; Algorithm 2 Distributed-Asy SVRG Server Node; Algorithm 3 Distributed-Asy SVRG Worker Node k |
| Open Source Code | No | No explicit statement or link providing access to the source code for the methodology described in this paper was found. |
| Open Datasets | Yes | We consider the multiclass classification task on MNIST dataset (Le Cun et al. 1998)... We use CIFAR-10 dataset (Krizhevsky and Hinton 2009) in the experiment... |
| Dataset Splits | No | The paper mentions '10,000 training samples and 2,000 testing samples' for MNIST and '20,000 samples as training data and 4,000 samples as testing data' for CIFAR-10. It does not explicitly state the use of a validation set or provide detailed splitting methodology for reproducibility. |
| Hardware Specification | Yes | We conduct experiments on a machine which has 2 sockets, and each socket has 18 cores. We conduct distributed-memory architecture experiment on AWS platform2, and each node is a t2.micro instance with one virtual CPU. |
| Software Dependencies | No | The paper mentions 'Open MP library' and 'MPICH library' by name, and implies the use of 'Tensor Flow', but does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | We construct a toy three-layer neural network (784 100 10), where Re LU activation function is used in the hidden layer. We train this neural network with softmax loss function, and ℓ2 regularization with weight C = 10 3. We set mini-batch size |It| = 10, and inner iteration length m = 1, 000. Updating only one component of x in each iteration is too time consuming, therefore we randomly select and update 1, 000 components. We construct a three-layer fully connected neural network (384 50 10). In the hidden layer, we use Re LU activation function. We train this model with softmax loss, and ℓ2 regularization with weight C = 1e 4. In this experiment, mini-batch size |It| = 10, and the inner loop length m = 2, 000. |