Asynchronous Accelerated Stochastic Gradient Descent

Authors: Qi Meng, Wei Chen, Jingcheng Yu, Taifeng Wang, Zhi-Ming Ma, Tie-Yan Liu

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we tested AASGD on a few benchmark datasets. The experimental results verified our theoretical findings and indicated that AASGD could be a highly effective and efficient algorithm for practical use.
Researcher Affiliation Collaboration 1 School of Mathematical Sciences, Peking University, 1501110036@pku.edu.cn 2Microsoft Research, {wche, taifengw, tie-yan.liu}@microsoft.com 3Fudan University, Jingcheng Yu.94@gmail.com 4Academy of Mathematics and Systems Science, Chinese Academy of Sciences, mazm@amt.ac.cn
Pseudocode Yes Algorithm 1 Asynchronous Accelerated SGD (AASGD)
Open Source Code No The paper does not provide any statement or link regarding the availability of its source code.
Open Datasets Yes We conducted binary classification tasks on three benchmark dataset: rcv1, real-sim, news20... The detailed information about the three datasets can be found from Lib SVM website.
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits. It only mentions the use of 'training error' as a stopping criterion.
Hardware Specification No The paper does not provide any specific hardware details used for running its experiments.
Software Dependencies No The paper does not mention any software dependencies with specific version numbers.
Experiment Setup Yes In the AASGD algorithm, we set the number of block partitions as m = d/100, the mini-batch size as pn/ P (P is the number of threads), and the inner loop K = 2mn. The stopping criterion in our experiments is the training error smaller than 10 10 (i.e., F(xk) F(x ) < 10 10). For the datasets we used, Lmax = Lres < 0.25 since the input data is normalized [Reddi et al., 2015], P, µ = 1/pn = 0.01 [Shamir et al., 2014]. In SASGD and AASGD, we set stepsizes 0 = 0.2 and = 0.1/P, which satisfy our assumptions in the theorems and corollaries.