Delay-Adaptive Distributed Stochastic Optimization
Authors: Zhaolin Ren, Zhengyuan Zhou, Linhai Qiu, Ajay Deshpande, Jayant Kalagnanam5503-5510
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Numerical Results We first verify convergence of delayadaptive DASGD on a standard Rosenbrock test function with d = 101 degrees of freedom, ... We also compared the accuracies of a logistic regression model learned using the delay-adaptive vs nondelay-adaptive DASGD algorithms on the MNIST dataset (Le Cun 1998), a standard benchmark for machine learning tasks. |
| Researcher Affiliation | Collaboration | Zhaolin Ren,1 Zhengyuan Zhou, 2,4 Linhai Qiu,3 Ajay Deshpande,4 Jayant Kalagnanam4 1Harvard University, 2New York University, 3Google Inc., 4IBM Research |
| Pseudocode | Yes | Algorithm 1: Distributed Asynchronous Stochastic Gradient Descent Require: Y0 Rd ... Algorithm 2: Delay-Adaptive DASGD Require: Y0 Rd ... Algorithm 3: DASGD-T Require: Initial state X0 Rd, step-size sequence αn, initial truncation parameter τ0 |
| Open Source Code | No | The paper does not include any statement about making its source code publicly available or provide a link to a code repository. |
| Open Datasets | Yes | MNIST test We also compared the accuracies of a logistic regression model learned using the delay-adaptive vs nondelay-adaptive DASGD algorithms on the MNIST dataset (Le Cun 1998), a standard benchmark for machine learning tasks. |
| Dataset Splits | No | The paper mentions training models and evaluating on 'test accuracy' but does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running experiments. It mentions distributed architectures and number of workers, but not specific hardware components. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation or experiments. |
| Experiment Setup | Yes | In all the cases with different delay functions, we use the same initial condition that was generated randomly but fixed throughout all the runs. For all the cases, we drew noise from a standard multivariate Gaussian distribution when computing the gradient of each time step. We ran 10 trials and averaged the results for each case. ... In the non-delay-adaptive cases, we used the fixed step size of 1e 4 as the baseline. In the delay-adaptive cases, we use the step size 1e 4/(n log n log log n + nc(log(n)/ log s(n))), where c = 1/2, but when the delay is less than 10, we use the nonadaptive step size of 1e 4 in order to allow faster convergence when the delay is small. |