Delayed Algorithms for Distributed Stochastic Weakly Convex Optimization
Authors: Wenzhi Gao, Qi Deng
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our numerical experiments further confirm the empirical superiority of our proposed methods. |
| Researcher Affiliation | Academia | Wenzhi Gao Stanford University gwz@stanford.edu Qi Deng Shanghai University of Finance and Economics qideng@sufe.edu.cn |
| Pseudocode | Yes | Algorithm 1: Delayed stochastic proximal subgradient method; Algorithm 2: Delayed stochastic prox-linear method; Algorithm 3: Safeguarded DSGD/DSPL |
| Open Source Code | No | The paper does not contain any explicit statement or link providing access to the source code for the methodology described. |
| Open Datasets | Yes | The real-life data is generated from zipcode dataset, where we vectorize a 16 × 16 hand-written digit from [16] |
| Dataset Splits | No | The paper specifies running for "400 epochs (K = 400m)" and a stopping criterion of "f < 1.5f(xˆ)", but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) for reproducibility. |
| Hardware Specification | Yes | Our first experiment runs in an asynchronous environment implemented by MPI Python interface and is profiled on an Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz machine with 10 cores and 20 threads. |
| Software Dependencies | No | The paper mentions using "MPI Python interface" but does not specify exact version numbers for either MPI or Python. It also states that "numerical linear algebra on the worker uses a raw implementation (not importing package)", indicating a lack of specific library dependencies with versions. |
| Experiment Setup | Yes | 2) Initial point and radius. Synthetic data: we generate x ∼ N(0, In) and start from x1 = x/x; zipcode data: we generate x ∼ N(ˆx, In) and take x1 = 10x. M = 1000x1. 3) Stopping criterion. We run algorithms for 400 epochs (K = 400m). ... 4) Stepsize. We set γ = K/α, where α ∈ {0.1, 0.5, 1.0} in the asynchronous environment, α ∈ [10−2, 101] for synthetic data and α ∈ [101, 102] for the zipcode dataset. 5) Simulated delay. In the simulated environment, we generate τk from two common distributions from literature, which are geometric G(p) and Poisson P(λ) [37]. After the delay is generated, it is truncated by twice the mean of the distribution. |