Distributed Asynchronous Optimization with Unbounded Delays: How Slow Can You Go?

Authors: Zhengyuan Zhou, Panayotis Mertikopoulos, Nicholas Bambos, Peter Glynn, Yinyu Ye, Li-Jia Li, Li Fei-Fei

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Numerical results. To validate our analysis, we test the convergence of Algorithm 2 against a standard Rosenbrock test function with d = 101 degrees of freedom... Our results are shown in Fig. 1. Starting from a random (but otherwise fixed) initial condition, we ran S = 100 realizations of DASGD (with and without delays). We then plotted a randomly chosen trajectory ( test sample in Fig. 1), the sample average, and the min/max over all samples at every update epoch.
Researcher Affiliation Collaboration 1Stanford University, Stanford, USA 2Univ. Grenoble Alpes, CNRS, Inria, LIG, 38000 Grenoble, France. 3Google, Mountain View, USA.
Pseudocode Yes Algorithm 1 Running SGD on a Master-Slave Architecture, Algorithm 2 Distributed asynchronous stochastic gradient descent, Algorithm 3 Master s DAGD Update
Open Source Code No No explicit statement or link indicating the availability of open-source code for the methodology described in the paper.
Open Datasets Yes To validate our analysis, we test the convergence of Algorithm 2 against a standard Rosenbrock test function with d = 101 degrees of freedom, i.e., [100(xi+1 x2i )2 + (1 xi)2], with xi [0, 2], i = 1, . . . , 101.
Dataset Splits No The paper describes numerical experiments on a mathematical test function but does not specify train/validation/test dataset splits in the conventional machine learning sense.
Hardware Specification No The paper mentions numerical experiments but does not provide any specific hardware details like GPU/CPU models or cloud resources used.
Software Dependencies No The paper does not provide any specific software dependencies with version numbers.
Experiment Setup Yes In both cases, Algorithm 2 was run with a decreasing step-size of the form αn 1/(n log n) and stochastic gradients drawn from a standard multivariate Gaussian distribution (i.e., zero mean and identity covariance matrix).