reproducibilityindex.ai

Delay-Adaptive Distributed Stochastic Optimization

Authors: Zhaolin Ren, Zhengyuan Zhou, Linhai Qiu, Ajay Deshpande, Jayant Kalagnanam5503-5510

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6 Numerical Results We first verify convergence of delayadaptive DASGD on a standard Rosenbrock test function with d = 101 degrees of freedom, ... We also compared the accuracies of a logistic regression model learned using the delay-adaptive vs nondelay-adaptive DASGD algorithms on the MNIST dataset (Le Cun 1998), a standard benchmark for machine learning tasks.
Researcher Affiliation	Collaboration	Zhaolin Ren,1 Zhengyuan Zhou, 2,4 Linhai Qiu,3 Ajay Deshpande,4 Jayant Kalagnanam4 1Harvard University, 2New York University, 3Google Inc., 4IBM Research
Pseudocode	Yes	Algorithm 1: Distributed Asynchronous Stochastic Gradient Descent Require: Y0 Rd ... Algorithm 2: Delay-Adaptive DASGD Require: Y0 Rd ... Algorithm 3: DASGD-T Require: Initial state X0 Rd, step-size sequence αn, initial truncation parameter τ0
Open Source Code	No	The paper does not include any statement about making its source code publicly available or provide a link to a code repository.
Open Datasets	Yes	MNIST test We also compared the accuracies of a logistic regression model learned using the delay-adaptive vs nondelay-adaptive DASGD algorithms on the MNIST dataset (Le Cun 1998), a standard benchmark for machine learning tasks.
Dataset Splits	No	The paper mentions training models and evaluating on 'test accuracy' but does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running experiments. It mentions distributed architectures and number of workers, but not specific hardware components.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation or experiments.
Experiment Setup	Yes	In all the cases with different delay functions, we use the same initial condition that was generated randomly but ﬁxed throughout all the runs. For all the cases, we drew noise from a standard multivariate Gaussian distribution when computing the gradient of each time step. We ran 10 trials and averaged the results for each case. ... In the non-delay-adaptive cases, we used the ﬁxed step size of 1e 4 as the baseline. In the delay-adaptive cases, we use the step size 1e 4/(n log n log log n + nc(log(n)/ log s(n))), where c = 1/2, but when the delay is less than 10, we use the nonadaptive step size of 1e 4 in order to allow faster convergence when the delay is small.