Delayed Algorithms for Distributed Stochastic Weakly Convex Optimization

Authors: Wenzhi Gao, Qi Deng

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our numerical experiments further confirm the empirical superiority of our proposed methods.
Researcher Affiliation Academia Wenzhi Gao Stanford University gwz@stanford.edu Qi Deng Shanghai University of Finance and Economics qideng@sufe.edu.cn
Pseudocode Yes Algorithm 1: Delayed stochastic proximal subgradient method; Algorithm 2: Delayed stochastic prox-linear method; Algorithm 3: Safeguarded DSGD/DSPL
Open Source Code No The paper does not contain any explicit statement or link providing access to the source code for the methodology described.
Open Datasets Yes The real-life data is generated from zipcode dataset, where we vectorize a 16 × 16 hand-written digit from [16]
Dataset Splits No The paper specifies running for "400 epochs (K = 400m)" and a stopping criterion of "f < 1.5f(xˆ)", but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) for reproducibility.
Hardware Specification Yes Our first experiment runs in an asynchronous environment implemented by MPI Python interface and is profiled on an Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz machine with 10 cores and 20 threads.
Software Dependencies No The paper mentions using "MPI Python interface" but does not specify exact version numbers for either MPI or Python. It also states that "numerical linear algebra on the worker uses a raw implementation (not importing package)", indicating a lack of specific library dependencies with versions.
Experiment Setup Yes 2) Initial point and radius. Synthetic data: we generate x ∼ N(0, In) and start from x1 = x/x; zipcode data: we generate x ∼ N(ˆx, In) and take x1 = 10x. M = 1000x1. 3) Stopping criterion. We run algorithms for 400 epochs (K = 400m). ... 4) Stepsize. We set γ = K/α, where α ∈ {0.1, 0.5, 1.0} in the asynchronous environment, α ∈ [10−2, 101] for synthetic data and α ∈ [101, 102] for the zipcode dataset. 5) Simulated delay. In the simulated environment, we generate τk from two common distributions from literature, which are geometric G(p) and Poisson P(λ) [37]. After the delay is generated, it is truncated by twice the mean of the distribution.