Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization
Authors: Hadrien Hendrikx, Lin Xiao, Sebastien Bubeck, Francis Bach, Laurent Massoulie
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on real-world datasets illustrate the benefits of acceleration in the ill-conditioned regime. We compare in this section the performances of SPAG with those of DANE and its heavyball acceleration, HB-DANE (Yuan and Li, 2019), as well as accelerated gradient descent (AGD). |
| Researcher Affiliation | Collaboration | 1INRIA, DIENS, PSL Research University, Paris, France. 2Microsoft Research, Redmond, WA, USA. |
| Pseudocode | Yes | Algorithm 1 SPAG(LF/φ, σF/φ, x0) |
| Open Source Code | Yes | We also provide code for SPAG in supplementary material. |
| Open Datasets | Yes | We use two datasets from Lib SVM1, RCV1 (Lewis et al., 2004) and the preprocessed version of KDD2010 (algebra) (Yu et al., 2010). Accessible at https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/binary.html |
| Dataset Splits | No | The paper describes how local datasets are constructed ('by shuffling the Lib SVM datasets, and then assigning a fixed portion to each worker') and how a 'preconditioning dataset' is created ('the server subsamples n points from its local dataset'), but it does not specify explicit training/validation/test dataset splits with percentages, sample counts, or references to predefined splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'a sparse implementation of SDCA (Shalev-Shwartz, 2016)' as the method used for local subproblems, but it does not list any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries). |
| Experiment Setup | Yes | We initialize all algorithms at the same point, which is the minimizer of the server s entire local loss (with 105 samples regardless of how many samples are used for preconditioning). we choose LF/φ = 1 and tune µ. we use SPAG with σ 1 F/φ = 1 + 2µ/λ and HB-DANE with β = (1 (1 + 2µ/λ) 1/2)2. keep doing passes over the preconditioning dataset until Vt(xt) 10 9 (checked at each epoch). |