reproducibilityindex.ai

Toward Understanding the Impact of Staleness in Distributed Machine Learning

Authors: Wei Dai, Yi Zhou, Nanqing Dong, Hao Zhang, Eric Xing

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments reveal the rich diversity of the effects of staleness on the convergence of ML algorithms and offer insights into seemingly contradictory reports in the literature. The empirical ﬁndings also inspire a new convergence analysis of SGD in non-convex optimization under staleness, matching the best-known convergence rate of O(1/T).
Researcher Affiliation	Collaboration	Apple Inc, Duke University, and Petuum Inc
Pseudocode	No	The paper describes the update rule of Async-SGD as 'xk+1 = xk ηk \|ξ(τk)\| fξ(τk)(xτk),' which is a mathematical expression for an algorithm step, but it is not presented in a formal 'pseudocode' or 'algorithm' block.
Open Source Code	No	The paper does not provide any statement about releasing the source code for the methodology or a link to a code repository.
Open Datasets	Yes	Table 1: Overview of the models, algorithms... and dataset (Krizhevsky & Hinton, 2009; Marcus et al., 1993; Le Cun, 1998; Harper & Konstan, 2016; Rennie) in our study. Datasets mentioned include CIFAR10, Penn Treebank, MNIST, 20 News Group, Movie Lens1M.
Dataset Splits	No	The paper mentions using 'test accuracy' and 'test loss' for evaluation but does not specify the proportions or methodology for train/validation/test splits (e.g., '80/10/10 split', or if a separate validation set was used for hyperparameter tuning).
Hardware Specification	No	The paper discusses simulation on a 'single machine' and 'distributed machine learning systems' but does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for experiments.
Software Dependencies	No	The paper does not specify the versions of any programming languages, libraries, or frameworks used for implementation (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x).
Experiment Setup	Yes	Table 1: Overview of the models, algorithms... Key Parameters... η denotes learning rate, which, if not speciﬁed, are tuned empirically for each algorithm and staleness level, β1, β2 are optimization hyperparameters... α, β in LDA are Dirichlet priors... We use batch size 32 for CNNs, DNNs, MLR, and VAEs... For MF, we use batch size of 25000 samples... For LDA we use D 10P as the batch size...