reproducibilityindex.ai

Communication-Efficient Distributed Dual Coordinate Ascent

Authors: Martin Jaggi, Virginia Smith, Martin Takac, Jonathan Terhorst, Sanjay Krishnan, Thomas Hofmann, Michael I Jordan

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we propose a communication-efﬁcient framework, COCOA, that uses local computation in a primal-dual setting to dramatically reduce the amount of necessary communication. We provide a strong convergence rate analysis for this class of algorithms, as well as experiments on real-world distributed datasets with implementations in Spark. In our experiments, we ﬁnd that as compared to stateof-the-art mini-batch versions of SGD and SDCA algorithms, COCOA converges to the same .001-accurate solution quality on average 25 as quickly.
Researcher Affiliation	Academia	Martin Jaggi ETH Zurich Virginia Smith UC Berkeley Martin Tak aˇc Lehigh University Jonathan Terhorst UC Berkeley Sanjay Krishnan UC Berkeley Thomas Hofmann ETH Zurich Michael I. Jordan UC Berkeley
Pseudocode	Yes	Algorithm 1: COCOA: Communication-Efﬁcient Distributed Dual Coordinate Ascent
Open Source Code	No	The paper does not provide an explicit statement or link for the open-source code of the methodology described.
Open Datasets	Yes	Table 1: Datasets for Empirical Study Dataset Training (n) Features (d) Sparsity λ Workers (K) cov 522,911 54 22.22% 1e-6 4 rcv1 677,399 47,236 0.16% 1e-6 8 imagenet 32,751 160,000 100% 1e-5 32
Dataset Splits	No	The paper mentions data distribution across workers and dataset sizes but does not specify train/validation/test splits (e.g., percentages or exact counts) for reproduction.
Hardware Specification	Yes	We apply these algorithms to standard hinge loss ℓ2-regularized support vector machines, using implementations written in Spark on m1.large Amazon EC2 instances [1].
Software Dependencies	No	The paper mentions "Spark" as the platform for implementation but does not specify version numbers for Spark or any other software libraries or dependencies.
Experiment Setup	Yes	For each algorithm, we additionally study the effect of scaling the average by a parameter βK... The datasets used in these analyses are summarized in Table 1, and were distributed among K = 4, 8, and 32 nodes, respectively. We use the same regularization parameters as speciﬁed in [16, 17]. In Figure 3 we explore the effect of H, the computation-communication trade-off factor, on the convergence of COCOA for the Cov dataset on a cluster of 4 nodes. In Figure 4, we attempt to scale the averaging step of each algorithm by using various βK values, for two different batch sizes on the Cov dataset (H = 1e5 and H = 100).