Optimal Gradient Sliding and its Application to Optimal Distributed Optimization Under Similarity

Authors: Dmitry Kovalev, Aleksandr Beznosikov, Ekaterina Borodich, Alexander Gasnikov, Gesualdo Scutari

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experiments We consider the Ridge Regression problem f(w) := 1 2N P N i=1(w T xi yi)2 + λ k w k2, where w is the vector of weights of the model, {xi, yi} N i=1 is the training dataset, and λ > 0 is the regularization parameter. We consider a network with 25 workers (simulated on a single-CPU machine), and use two types of datasets, namely: synthetic and real data. ... Results are summarized in Figure 1 the first two figures from the top left correspond to synthetic data while the other 6 on real data.
Researcher Affiliation Collaboration Dmitry Kovalev KAUST , Saudi Arabia dakovalev1@gmail.com Aleksandr Beznosikov MIPT , HSE University and Yandex, Russia anbeznosikov@gmail.com Ekaterina Borodich MIPT and HSE University, Russia borodich.ed@phystech.edu Alexander Gasnikov MIPT, HSE University and IITP RAS , Russia gasnikov@yandex.ru Gesualdo Scutari Purdue University, USA gscutari@purdue.edu
Pseudocode Yes Algorithm 1 Accelerated Extragradient ... Algorithm 2 Extragradient Sliding for VIs
Open Source Code No Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A] , we work with small models and simple optimisers, the experiments are easy to reproduce on any PC, but we will try to make the code user friendly and put it out in the public domain
Open Datasets Yes For simulations with real data, we considered the LIBSVM library [11] and give each agent a full dataset.
Dataset Splits No The paper describes the datasets used and some parameters for the experiments, but does not explicitly mention train/validation/test splits beyond indicating the use of a "training dataset" for Ridge Regression. It doesn't specify percentages or how data was partitioned for validation.
Hardware Specification No The paper mentions "simulated on a single-CPU machine" but provides no specific details on the CPU model, GPU, or other hardware specifications. In the "Questions for Paper Analysis" section, it states: "(d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] , Appendix C, but we make small experiments, they can be run on any PC". However, Appendix C (not provided) is referred to, and the main text itself is vague.
Software Dependencies No The paper mentions using LIBSVM library for datasets but does not specify any software dependencies or versions for its algorithms or implementation beyond that. It does not list specific versions for Python, PyTorch, or any other libraries or solvers.
Experiment Setup Yes 5.1 Minimization We consider the Ridge Regression problem f(w) := 1 2N P N i=1(w T xi yi)2 + λ k w k2 ... For the synthetic dataset we choose the noise level and the regularization parameter such that L/δ = 200 and L/λ = 105. For the real datasets the regularization parameter is chosen such that L/λ = 106. See Table 2 for all values of L, δ, µ and m. ... The settings of the methods are made as described in the original papers. For algorithms that assume an absolutely accurate solution of local problems (DANE, SPAG, Acc SONATA), we use Ac GD with an accuracy of 10 12 as a subsolver. ... (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] , Section 5 gives information about the problem setup, all methods are tuned how it is done in the original papers