Asynchronous Accelerated Stochastic Gradient Descent
Authors: Qi Meng, Wei Chen, Jingcheng Yu, Taifeng Wang, Zhi-Ming Ma, Tie-Yan Liu
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we tested AASGD on a few benchmark datasets. The experimental results verified our theoretical findings and indicated that AASGD could be a highly effective and efficient algorithm for practical use. |
| Researcher Affiliation | Collaboration | 1 School of Mathematical Sciences, Peking University, 1501110036@pku.edu.cn 2Microsoft Research, {wche, taifengw, tie-yan.liu}@microsoft.com 3Fudan University, Jingcheng Yu.94@gmail.com 4Academy of Mathematics and Systems Science, Chinese Academy of Sciences, mazm@amt.ac.cn |
| Pseudocode | Yes | Algorithm 1 Asynchronous Accelerated SGD (AASGD) |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of its source code. |
| Open Datasets | Yes | We conducted binary classification tasks on three benchmark dataset: rcv1, real-sim, news20... The detailed information about the three datasets can be found from Lib SVM website. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits. It only mentions the use of 'training error' as a stopping criterion. |
| Hardware Specification | No | The paper does not provide any specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper does not mention any software dependencies with specific version numbers. |
| Experiment Setup | Yes | In the AASGD algorithm, we set the number of block partitions as m = d/100, the mini-batch size as pn/ P (P is the number of threads), and the inner loop K = 2mn. The stopping criterion in our experiments is the training error smaller than 10 10 (i.e., F(xk) F(x ) < 10 10). For the datasets we used, Lmax = Lres < 0.25 since the input data is normalized [Reddi et al., 2015], P, µ = 1/pn = 0.01 [Shamir et al., 2014]. In SASGD and AASGD, we set stepsizes 0 = 0.2 and = 0.1/P, which satisfy our assumptions in the theorems and corollaries. |