Newton Method over Networks is Fast up to the Statistical Precision
Authors: Amir Daneshmand, Gesualdo Scutari, Pavel Dvurechensky, Alexander Gasnikov
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we test numerically our theoretical findings on two classes of problems over meshed networks: 1) ridge regression and 2) logistic regression. Other experiments can be found in the supplementary material (cf. Sec. A). The network graph is generated using an Erd os-R enyi model G(m, p), with m = 30 nodes and different values of p to span different level of connectivity. We compare Di Reg INA with the following methods: Distributed (first-order) method with gradient tracking: we consider SONATA (Sun et al., 2019) and DIGing (Nedic et al., 2017); both build on the idea of gradient tracking, with the former applicable also to constrained problems. For the SONATA algorithm, we will simulate two instances, namely: SONATA-L (L stands for linearization) and SONATA-F (F stands for full); the former uses only first-order information in the agents local updates (as DGing) while the latter exploits functions similarity by employing local mirrordescent-based optimization. Distributed accelerated first-order methods: we consider APAPC (Kovalev et al., 2020) and SSDA (Scaman et al., 2017), which employ Nesterov acceleration on the local optimization steps with the former using primal gradients while the latter requiring gradients of the conjugate functions and Chebyshev acceleration on the consensus steps. These schemes do not leverage any similarity among the local agents functions. Distributed second-order methods: We implement i) Network Newton-K (NN-K) (Mokhtari et al., 2016b) with K = 1 so that it has the same communication cost per iteration of Di Reg INA ; ii) SONATA-F (Sun et al., 2019), which is a mirror descent-type distributed scheme wherein agents need to solve exactly a strongly convex optimization problem; and iii) Newton Tracking (NT) (Jiaojiao et al., 2020), which has been shown the outperform the majority of distributed second-order methods. All the algorithms are coded in MATLAB R2019a, running on a computer with Intel(R) Core(TM) i7-8650U CPU@1.90GHz, 16.0 GB of RAM, and 64-bit Windows 10. |
| Researcher Affiliation | Academia | 1School of Industrial Engineering, Purdue University, West Lafayette, IN, USA 2Weierstrass Institute for Applied Analysis and Stochastics, Berlin, Germany 3Higher School of Economics (HSE) University, Moscow, Russia 4Moscow Institute of Physics and Technology, Dolgoprudny, Russia. Correspondence to: Gesualdo Scutari <gscutari@purdue.edu>. |
| Pseudocode | Yes | Algorithm 1 Di Reg INA Data: x0 i K and s0 i = fi(x0 i ), τi > 0, Mi > 0, i. Iterate: ν = 1, 2, ... [S.1] [Local Optimization] Each agent i computes xν+ i : xν+ i = argmin y K F(xν i ) + sν i , y xν i 2 2fi(xν i ) + τi I (y xν i ) , y xν i + Mi 6 y xν i 3 . (7a) [S.2] [Local Communication] Each agent i updates its local variables according to j=1 (WK)i,j xν+ j , (7b) j=1 (WK)i,j sν j + fj(xν+1 j ) fj(xν j ) . |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code availability. |
| Open Datasets | Yes | We train ridge regression, LIBSVM, scaled mg dataset (Flake & Lawrence, 2002), which is an instance of (P) with fi(x) = (1/2n) Aix bi 2 + λ 2 x 2 and K = Rd, with d = 6. We set λ = 1/ N = 0.0269; we estimate β = 0.1457 and µ = 0.0929. The graph parameter p = 0.6, 0.33, 0.28, resulting in the connectivity values ρ 0.20, 0.41, 0.70, respectively. We compared Di Reg INA, NN-1, DIGing, SONATA-F and NT, all initialized from the same identical random point. The coefficients of the matrix W are chosen according to the Metropolis Hastings rule (Xiao et al., 2007). The free parameters of the algorithm are tuned manually; specifically: Di Reg INA, τ = 2β, M = 1e 3, and K = 1; NN-1, α = 1e 3 and ϵ = 1; DIGing, stepsize equal to 0.5; SONATA-F, τ = 0.27; NT, ϵ = 0.08 and α = 0.1. This tuning corresponds to the best practical performance we observed. [...] We train logistic regression models, regularized by the ℓ2-ball constraint (with radius 1). The problem is an instance of (P), with each fi(x) = (1/n) Pn j=1[ξ(j) i ln(z(j) i ) + (1 ξ(j) i ) ln(1 z(j) i )], where z(j) i 1/(1 + e a(j) i ,x ) and binary class labels ξ(j) i {0, 1} and vectors a(j) i , i = 1, . . . m and j = 1, . . . , n are determined by the data set. We considered the LIBSVM a4a (N = 4, 781, d = 123) and synthetic data (N = 900, d = 150). |
| Dataset Splits | No | The paper mentions using datasets for training and testing, but it does not specify explicit training, validation, and test splits with percentages or sample counts. |
| Hardware Specification | Yes | All the algorithms are coded in MATLAB R2019a, running on a computer with Intel(R) Core(TM) i7-8650U CPU@1.90GHz, 16.0 GB of RAM, and 64-bit Windows 10. |
| Software Dependencies | Yes | All the algorithms are coded in MATLAB R2019a, running on a computer with Intel(R) Core(TM) i7-8650U CPU@1.90GHz, 16.0 GB of RAM, and 64-bit Windows 10. |
| Experiment Setup | Yes | The free parameters of the algorithm are tuned manually; specifically: Di Reg INA, τ = 2β, M = 1e 3, and K = 1; NN-1, α = 1e 3 and ϵ = 1; DIGing, stepsize equal to 0.5; SONATA-F, τ = 0.27; NT, ϵ = 0.08 and α = 0.1. |