Dual-Free Stochastic Decentralized Optimization with Variance Reduction

Authors: Hadrien Hendrikx, Francis Bach, Laurent Massoulié

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental illustrate its effectiveness with simulations on real data. and We investigate in this section the practical performances of DVR. We solve a regularized logistic regression problem on the RCV1 dataset [18] (d = 47236) with n = 81 (leading to m = 2430) and two different graph topologies: an Erd os-Rényi random graph (see, e.g., [3]) and a grid.
Researcher Affiliation Academia Hadrien Hendrikx INRIA DIENS PSL Research University hadrien.hendrikx@inria.fr Francis Bach INRIA DIENS PSL Research University francis.bach@inria.fr Laurent Massoulié INRIA DIENS PSL Research University laurent.massoulie@inria.fr
Pseudocode Yes Algorithm 1 DVR(z0) 1: α = 2λ+ min(A comm D 1 M Acomm), η = min pcomm λmax(A commΣcomm Acomm), pij α(1+σ 1 i Lij) 2: θ(i) 0 = ( m j=1 fij(z(ij) 0 ))/σi. // z0 is arbitrary but not θ0. 3: for t = 0 to K 1 do // Run for K iterations 4: Sample ut uniformly in [0, 1]. // Randomly decide the kind of update 5: if ut pcomm then 6: θt+1 = θt η pcomm ΣWθt // Communication using W 7: else 8: for i = 1 to n do 9: Sample j {1, , m} with probability pij. 10: z(ij ) t+1 = z(ij ) t for j = j // Only one virtual node is updated 11: z(ij) t+1 = 1 αη z(ij) t + αη pij θ(i) t // Virtual node update 12: θ(i) t+1 = θ(i) t 1 fij(z(ij) t+1) fij(z(ij) t ) // Local update using fij 13: return θK
Open Source Code Yes Further experimental results are given in Appendix D, and the code is available in supplementary material and at https://github.com/Hadrien Hx/DVR_Neur IPS.
Open Datasets Yes We solve a regularized logistic regression problem on the RCV1 dataset [18] (d = 47236) with n = 81 (leading to m = 2430) and two different graph topologies: an Erd os-Rényi random graph (see, e.g., [3]) and a grid.
Dataset Splits No The paper uses the RCV1 dataset but does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or cross-validation folds) needed for reproduction.
Hardware Specification No All other plots are taken with respect to (simulated) time (i.e., computing fij takes time 1 and multiplying by W takes time τ) with τ = 250 in order to report results that are independent of the computing cluster hardware and status. This indicates no specific hardware was detailed.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup Yes All parameters are chosen according to theory, except for the smoothness of the fi, which requires finding the smallest eigenvalue of a d d matrix. For this, we start with Lb = σi + m j=1 Lij (which is a known upper bound), and decrease it while convergence is ensured, leading to κb = 0.01κs. The parameters for accelerated EXTRA are chosen as in [20] since tuning the number of inner iterations does not significantly improve the results (at the cost of a high tuning effort). For accelerated DVR, we set the number of inner iterations to N/pcomp (one pass over the local dataset). We use Chebyshev acceleration for (accelerated) DVR but not for (accelerated) EXTRA since it is actually slower, as predicted by the theory. and We choose µ2 k = 1/2 for all communication edges, so the gossip matrix W is the Laplacian of the graph.