reproducibilityindex.ai

Parallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization

Authors: Cong Fang, Zhouchen Lin

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also conduct experiments on a shared memory multi-core system to demonstrate the efﬁciency of our algorithm. In this section, we conduct experiments on a shared memory multi-core system to validate the efﬁciency of our algorithm empirically.
Researcher Affiliation	Academia	Key Laboratory of Machine Perception (MOE), School of EECS, Peking University, P. R. China Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, P. R. China
Pseudocode	Yes	Algorithm 1 Serial SVRG; Algorithm 2 ASVRG
Open Source Code	No	The paper does not contain any explicit statement or link to open-source code for the described methodology. The supplementary material link is general and does not specify code availability.
Open Datasets	Yes	We experiment on two dataset: MNIST dataset3 and CIFAR10 dataset (Krizhevsky and Hinton 2009). 3http://yann.lecun.com/exdb/mnist/
Dataset Splits	No	The paper does not provide specific training/test/validation dataset splits (e.g., percentages or counts). It mentions using MNIST and CIFAR10 but not how they were partitioned for training or validation.
Hardware Specification	Yes	All the experiments are performed on an Intel multi-core 4-socket machine with 128 GB memory. Each socket is associated with 8 computation cores.
Software Dependencies	No	We implement all methods in C++ using POSIX threads as the parallel programming framework. The paper mentions software components but does not provide specific version numbers for C++ compiler, POSIX threads, or any libraries, which are necessary for reproducibility.
Experiment Setup	Yes	For SVRG, we choose a ﬁxed step size, and choose γ that gives the best performance on one core. When there are more than one core, the step size does not change. For SGD, the step size is chosen based on (Reddi et al. 2016), which is γt = γ0(1+γ t/n ) 1, where γ0 and γ are chosen to give the best performance. We use the normalized initialization in (Glorot and Bengio 2010), (Reddi et al. 2016). The parameters are chosen uniformly from [ 6/(ni + no), 6/(ni + no)], where ni and no are the numbers of input and output layers of the neural networks, respectively. We choose a mini-batch size to be 100, which is a common setting in training neural networks. For ASGD, we choose the mini-batch size to be 50, and the step size to be 10 4, which we ﬁnd is better than the setting used in (Lian et al. 2015).