reproducibilityindex.ai

Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization

Authors: Zhouyuan Huo, Heng Huang

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our methods by optimizing multi-layer neural networks on two real datasets (MNIST and CIFAR-10), and experimental results demonstrate our theoretical analysis.
Researcher Affiliation	Academia	Zhouyuan Huo Dept. of Computer Science and Engineering University of Texas at Arlington Arlington, TX, 76019, USA zhouyuan.huo@mavs.uta.edu Heng Huang Dept. of Computer Science and Engineering University of Texas at Arlington Arlington, TX, 76019, USA heng@uta.edu
Pseudocode	Yes	Algorithm 1 Shared-Asy SVRG; Algorithm 2 Distributed-Asy SVRG Server Node; Algorithm 3 Distributed-Asy SVRG Worker Node k
Open Source Code	No	No explicit statement or link providing access to the source code for the methodology described in this paper was found.
Open Datasets	Yes	We consider the multiclass classiﬁcation task on MNIST dataset (Le Cun et al. 1998)... We use CIFAR-10 dataset (Krizhevsky and Hinton 2009) in the experiment...
Dataset Splits	No	The paper mentions '10,000 training samples and 2,000 testing samples' for MNIST and '20,000 samples as training data and 4,000 samples as testing data' for CIFAR-10. It does not explicitly state the use of a validation set or provide detailed splitting methodology for reproducibility.
Hardware Specification	Yes	We conduct experiments on a machine which has 2 sockets, and each socket has 18 cores. We conduct distributed-memory architecture experiment on AWS platform2, and each node is a t2.micro instance with one virtual CPU.
Software Dependencies	No	The paper mentions 'Open MP library' and 'MPICH library' by name, and implies the use of 'Tensor Flow', but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	We construct a toy three-layer neural network (784 100 10), where Re LU activation function is used in the hidden layer. We train this neural network with softmax loss function, and ℓ2 regularization with weight C = 10 3. We set mini-batch size \|It\| = 10, and inner iteration length m = 1, 000. Updating only one component of x in each iteration is too time consuming, therefore we randomly select and update 1, 000 components. We construct a three-layer fully connected neural network (384 50 10). In the hidden layer, we use Re LU activation function. We train this model with softmax loss, and ℓ2 regularization with weight C = 1e 4. In this experiment, mini-batch size \|It\| = 10, and the inner loop length m = 2, 000.