Communication-Efficient Distributed Dual Coordinate Ascent

Authors: Martin Jaggi, Virginia Smith, Martin Takac, Jonathan Terhorst, Sanjay Krishnan, Thomas Hofmann, Michael I Jordan

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we propose a communication-efficient framework, COCOA, that uses local computation in a primal-dual setting to dramatically reduce the amount of necessary communication. We provide a strong convergence rate analysis for this class of algorithms, as well as experiments on real-world distributed datasets with implementations in Spark. In our experiments, we find that as compared to stateof-the-art mini-batch versions of SGD and SDCA algorithms, COCOA converges to the same .001-accurate solution quality on average 25 as quickly.
Researcher Affiliation Academia Martin Jaggi ETH Zurich Virginia Smith UC Berkeley Martin Tak aˇc Lehigh University Jonathan Terhorst UC Berkeley Sanjay Krishnan UC Berkeley Thomas Hofmann ETH Zurich Michael I. Jordan UC Berkeley
Pseudocode Yes Algorithm 1: COCOA: Communication-Efficient Distributed Dual Coordinate Ascent
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of the methodology described.
Open Datasets Yes Table 1: Datasets for Empirical Study Dataset Training (n) Features (d) Sparsity λ Workers (K) cov 522,911 54 22.22% 1e-6 4 rcv1 677,399 47,236 0.16% 1e-6 8 imagenet 32,751 160,000 100% 1e-5 32
Dataset Splits No The paper mentions data distribution across workers and dataset sizes but does not specify train/validation/test splits (e.g., percentages or exact counts) for reproduction.
Hardware Specification Yes We apply these algorithms to standard hinge loss ℓ2-regularized support vector machines, using implementations written in Spark on m1.large Amazon EC2 instances [1].
Software Dependencies No The paper mentions "Spark" as the platform for implementation but does not specify version numbers for Spark or any other software libraries or dependencies.
Experiment Setup Yes For each algorithm, we additionally study the effect of scaling the average by a parameter βK... The datasets used in these analyses are summarized in Table 1, and were distributed among K = 4, 8, and 32 nodes, respectively. We use the same regularization parameters as specified in [16, 17]. In Figure 3 we explore the effect of H, the computation-communication trade-off factor, on the convergence of COCOA for the Cov dataset on a cluster of 4 nodes. In Figure 4, we attempt to scale the averaging step of each algorithm by using various βK values, for two different batch sizes on the Cov dataset (H = 1e5 and H = 100).