Asynchronous Distributed Semi-Stochastic Gradient Optimization

Authors: Ruiliang Zhang, Shuai Zheng, James T. Kwok

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the Google Cloud Computing Platform demonstrate that the proposed algorithm outperforms state-of-the-art distributed asynchronous algorithms in terms of both wall clock time and solution quality.
Researcher Affiliation Academia Ruiliang Zhang, Shuai Zheng, James T. Kwok Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong {rzhangaf, szhengac, jamesk}@cse.ust.hk
Pseudocode Yes Algorithm 1 Stochastic variance reduced gradient (SVRG) (Johnson and Zhang 2013)., Algorithm 2 Scheduler., Algorithm 3 Worker p receiving an update/evaluation task t at stage s., Algorithm 4 Daemon thread of the server., Algorithm 5 Computing thread of the server.
Open Source Code No The Petuum SGD code is downloaded from http:// petuum.github.io/, while the other asynchronous algorithms are implemented in C++ by reusing most of our system s codes." No concrete access to *their* code is provided.
Open Datasets Yes Experiments are performed on the Mnist8m and DNA data sets (Table 1) from the Lib SVM archive1 and Pascal Large Scale Learning Challenge2. 1https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/ 2http://argescale.ml.tu-berlin.de/
Dataset Splits No All other parameters are tuned by a validation set, which is 1% of the data set." This mentions the use of a validation set but not explicit training/test splits.
Hardware Specification Yes Using the Google Cloud Computing Platform3, we set up a cluster with 18 computing nodes. Each node is a google cloud n1-highmem-8 instance with eight cores and 52GB memory. Each scheduler/server takes one instance, while each worker takes a core.
Software Dependencies No The system is implemented in C++, with the Zero MQ package for communication." This mentions software names but no version numbers.
Experiment Setup Yes We use 128 workers. To maximize parallelism, we fix τ to 128. (...) For distr-vr-sgd and distr-svrg, the number of stages is S = 50, and the number of iterations in each stage is m = N/B , where B is about 10% of each worker s local data set size. (...) the learning rate used by distr-svrg (as determined by the validation set) is small (10 6 vs 10 3 in distr-vr-sgd).