reproducibilityindex.ai

Asynchronous Distributed Semi-Stochastic Gradient Optimization

Authors: Ruiliang Zhang, Shuai Zheng, James T. Kwok

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the Google Cloud Computing Platform demonstrate that the proposed algorithm outperforms state-of-the-art distributed asynchronous algorithms in terms of both wall clock time and solution quality.
Researcher Affiliation	Academia	Ruiliang Zhang, Shuai Zheng, James T. Kwok Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong {rzhangaf, szhengac, jamesk}@cse.ust.hk
Pseudocode	Yes	Algorithm 1 Stochastic variance reduced gradient (SVRG) (Johnson and Zhang 2013)., Algorithm 2 Scheduler., Algorithm 3 Worker p receiving an update/evaluation task t at stage s., Algorithm 4 Daemon thread of the server., Algorithm 5 Computing thread of the server.
Open Source Code	No	The Petuum SGD code is downloaded from http:// petuum.github.io/, while the other asynchronous algorithms are implemented in C++ by reusing most of our system s codes." No concrete access to their code is provided.
Open Datasets	Yes	Experiments are performed on the Mnist8m and DNA data sets (Table 1) from the Lib SVM archive1 and Pascal Large Scale Learning Challenge2. 1https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/ 2http://argescale.ml.tu-berlin.de/
Dataset Splits	No	All other parameters are tuned by a validation set, which is 1% of the data set." This mentions the use of a validation set but not explicit training/test splits.
Hardware Specification	Yes	Using the Google Cloud Computing Platform3, we set up a cluster with 18 computing nodes. Each node is a google cloud n1-highmem-8 instance with eight cores and 52GB memory. Each scheduler/server takes one instance, while each worker takes a core.
Software Dependencies	No	The system is implemented in C++, with the Zero MQ package for communication." This mentions software names but no version numbers.
Experiment Setup	Yes	We use 128 workers. To maximize parallelism, we ﬁx τ to 128. (...) For distr-vr-sgd and distr-svrg, the number of stages is S = 50, and the number of iterations in each stage is m = N/B , where B is about 10% of each worker s local data set size. (...) the learning rate used by distr-svrg (as determined by the validation set) is small (10 6 vs 10 3 in distr-vr-sgd).