DiSCO: Distributed Optimization for Self-Concordant Empirical Loss

Authors: Yuchen Zhang, Xiao Lin

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct numerical experiments to compare the Di SCO algorithm with several state-of-the-art distributed optimization algorithms: the ADMM algorithm (Boyd et al., 2010), the accelerated full gradient method (AFG) (Nesterov, 2004), the L-BFGS quasi Newton method (Nocedal & Wright, 2006, Section 7.2), and the DANE algorithm (Shamir et al., 2014). For comparison, we solve three binary classification tasks using logistic regression. The datasets are obtained from the LIBSVM datasets (Chang & Lin, 2011) and summarized in Table 2. Figure 2. Comparing Di SCO with other distributed optimization algorithms.
Researcher Affiliation Collaboration Yuchen Zhang YUCZHANG@EECS.BERKELEY.EDU University of California Berkeley, Berkeley, CA 94720, USA Lin Xiao LIN.XIAO@MICROSOFT.COM Microsoft Research, Redmond, WA 98053, USA
Pseudocode Yes Algorithm 1 Inexact damped Newton method Algorithm 2 Distributed PCG algorithm Algorithm 3 Di SCO
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets Yes For comparison, we solve three binary classification tasks using logistic regression. The datasets are obtained from the LIBSVM datasets (Chang & Lin, 2011) and summarized in Table 2.
Dataset Splits Yes We splits each dataset evenly to m machines, with m {4, 16, 64}.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers.
Experiment Setup Yes For Algorithm 3, we choose the input parameters µ = m1/2µ0, where µ0 is manually chosen to be µ0 = 0 for Covtype, µ0 = 4 10 4 for RCV1, and µ0 = 2 10 4 for News20. The regularization parameter is set to be γ = 10 5. For fair comparison, we manually tune the penalty parameter of ADMM and the regularization parameter µ for DANE to optimize their performance. For AFG, we used an adaptive line search scheme (Nesterov, 2013) to speed up its convergence. For L-BFGS, we adopted the memory size 30, as suggested in Nocedal & Wright (2006).