Asynchronous Distributed Semi-Stochastic Gradient Optimization
Authors: Ruiliang Zhang, Shuai Zheng, James T. Kwok
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the Google Cloud Computing Platform demonstrate that the proposed algorithm outperforms state-of-the-art distributed asynchronous algorithms in terms of both wall clock time and solution quality. |
| Researcher Affiliation | Academia | Ruiliang Zhang, Shuai Zheng, James T. Kwok Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong {rzhangaf, szhengac, jamesk}@cse.ust.hk |
| Pseudocode | Yes | Algorithm 1 Stochastic variance reduced gradient (SVRG) (Johnson and Zhang 2013)., Algorithm 2 Scheduler., Algorithm 3 Worker p receiving an update/evaluation task t at stage s., Algorithm 4 Daemon thread of the server., Algorithm 5 Computing thread of the server. |
| Open Source Code | No | The Petuum SGD code is downloaded from http:// petuum.github.io/, while the other asynchronous algorithms are implemented in C++ by reusing most of our system s codes." No concrete access to *their* code is provided. |
| Open Datasets | Yes | Experiments are performed on the Mnist8m and DNA data sets (Table 1) from the Lib SVM archive1 and Pascal Large Scale Learning Challenge2. 1https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/ 2http://argescale.ml.tu-berlin.de/ |
| Dataset Splits | No | All other parameters are tuned by a validation set, which is 1% of the data set." This mentions the use of a validation set but not explicit training/test splits. |
| Hardware Specification | Yes | Using the Google Cloud Computing Platform3, we set up a cluster with 18 computing nodes. Each node is a google cloud n1-highmem-8 instance with eight cores and 52GB memory. Each scheduler/server takes one instance, while each worker takes a core. |
| Software Dependencies | No | The system is implemented in C++, with the Zero MQ package for communication." This mentions software names but no version numbers. |
| Experiment Setup | Yes | We use 128 workers. To maximize parallelism, we fix τ to 128. (...) For distr-vr-sgd and distr-svrg, the number of stages is S = 50, and the number of iterations in each stage is m = N/B , where B is about 10% of each worker s local data set size. (...) the learning rate used by distr-svrg (as determined by the validation set) is small (10 6 vs 10 3 in distr-vr-sgd). |