DiSCO: Distributed Optimization for Self-Concordant Empirical Loss
Authors: Yuchen Zhang, Xiao Lin
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct numerical experiments to compare the Di SCO algorithm with several state-of-the-art distributed optimization algorithms: the ADMM algorithm (Boyd et al., 2010), the accelerated full gradient method (AFG) (Nesterov, 2004), the L-BFGS quasi Newton method (Nocedal & Wright, 2006, Section 7.2), and the DANE algorithm (Shamir et al., 2014). For comparison, we solve three binary classification tasks using logistic regression. The datasets are obtained from the LIBSVM datasets (Chang & Lin, 2011) and summarized in Table 2. Figure 2. Comparing Di SCO with other distributed optimization algorithms. |
| Researcher Affiliation | Collaboration | Yuchen Zhang YUCZHANG@EECS.BERKELEY.EDU University of California Berkeley, Berkeley, CA 94720, USA Lin Xiao LIN.XIAO@MICROSOFT.COM Microsoft Research, Redmond, WA 98053, USA |
| Pseudocode | Yes | Algorithm 1 Inexact damped Newton method Algorithm 2 Distributed PCG algorithm Algorithm 3 Di SCO |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | For comparison, we solve three binary classification tasks using logistic regression. The datasets are obtained from the LIBSVM datasets (Chang & Lin, 2011) and summarized in Table 2. |
| Dataset Splits | Yes | We splits each dataset evenly to m machines, with m {4, 16, 64}. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | For Algorithm 3, we choose the input parameters µ = m1/2µ0, where µ0 is manually chosen to be µ0 = 0 for Covtype, µ0 = 4 10 4 for RCV1, and µ0 = 2 10 4 for News20. The regularization parameter is set to be γ = 10 5. For fair comparison, we manually tune the penalty parameter of ADMM and the regularization parameter µ for DANE to optimize their performance. For AFG, we used an adaptive line search scheme (Nesterov, 2013) to speed up its convergence. For L-BFGS, we adopted the memory size 30, as suggested in Nocedal & Wright (2006). |