Communication-Efficient Distributed Dual Coordinate Ascent
Authors: Martin Jaggi, Virginia Smith, Martin Takac, Jonathan Terhorst, Sanjay Krishnan, Thomas Hofmann, Michael I Jordan
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we propose a communication-efficient framework, COCOA, that uses local computation in a primal-dual setting to dramatically reduce the amount of necessary communication. We provide a strong convergence rate analysis for this class of algorithms, as well as experiments on real-world distributed datasets with implementations in Spark. In our experiments, we find that as compared to stateof-the-art mini-batch versions of SGD and SDCA algorithms, COCOA converges to the same .001-accurate solution quality on average 25 as quickly. |
| Researcher Affiliation | Academia | Martin Jaggi ETH Zurich Virginia Smith UC Berkeley Martin Tak aˇc Lehigh University Jonathan Terhorst UC Berkeley Sanjay Krishnan UC Berkeley Thomas Hofmann ETH Zurich Michael I. Jordan UC Berkeley |
| Pseudocode | Yes | Algorithm 1: COCOA: Communication-Efficient Distributed Dual Coordinate Ascent |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code of the methodology described. |
| Open Datasets | Yes | Table 1: Datasets for Empirical Study Dataset Training (n) Features (d) Sparsity λ Workers (K) cov 522,911 54 22.22% 1e-6 4 rcv1 677,399 47,236 0.16% 1e-6 8 imagenet 32,751 160,000 100% 1e-5 32 |
| Dataset Splits | No | The paper mentions data distribution across workers and dataset sizes but does not specify train/validation/test splits (e.g., percentages or exact counts) for reproduction. |
| Hardware Specification | Yes | We apply these algorithms to standard hinge loss ℓ2-regularized support vector machines, using implementations written in Spark on m1.large Amazon EC2 instances [1]. |
| Software Dependencies | No | The paper mentions "Spark" as the platform for implementation but does not specify version numbers for Spark or any other software libraries or dependencies. |
| Experiment Setup | Yes | For each algorithm, we additionally study the effect of scaling the average by a parameter βK... The datasets used in these analyses are summarized in Table 1, and were distributed among K = 4, 8, and 32 nodes, respectively. We use the same regularization parameters as specified in [16, 17]. In Figure 3 we explore the effect of H, the computation-communication trade-off factor, on the convergence of COCOA for the Cov dataset on a cluster of 4 nodes. In Figure 4, we attempt to scale the averaging step of each algorithm by using various βK values, for two different batch sizes on the Cov dataset (H = 1e5 and H = 100). |