Preserved central model for faster bidirectional compression in distributed settings

Authors: Constantin Philippenko, Aymeric Dieuleveut

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we illustrate the validity of the theoretical results given in the previous section on both synthetic and real datasets, on (1) least-squares linear regression (LSR), (2) logistic regression (LR), and (3) non-convex deep learning. We compare MCM with classical algorithms used in distributed settings: Diana, Artemis, Dore and of course the simplest setting SGD, which is the baseline.
Researcher Affiliation Academia Constantin Philippenko Aymeric Dieuleveut CMAP, École Polytechnique, Institut Polytechnique de Paris [fistname].[lastname]@polytechnique.edu
Pseudocode Yes The pseudocode of Rand-MCM is given in Algorithm 1 in Appendix A.
Open Source Code Yes All the code is provided on our github repository.
Open Datasets Yes We used 9 different datasets. One toy dataset devoted to linear regression in an homogeneous setting. Five datasets commonly used in convex optimization (a9a [8], quantum [7], phishing [8], superconduct [13] and w8a [8]); see Table S1 for more details. Experiments were conducted with heterogeneous workers. Four dataset in a non-convex settings (CIFAR10, Fashion-MNIST, FE-MNIST, MNIST); see Table S2 for more details.
Dataset Splits No The paper mentions using various datasets and batch sizes, but does not explicitly provide specific training/validation/test dataset splits (percentages, counts, or references to predefined splits) needed to reproduce the experiment.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions some techniques and libraries used (e.g., stochastic scalar quantization, LIBSVM) but does not provide specific version numbers for the key software components or dependencies needed to replicate the experiment.
Experiment Setup Yes Each experiment has been run with N = 20 workers using stochastic scalar quantization [3], w.r.t. 2-norm. To maximize compression, we always quantize on a single level (s = 20), unless for PP (s = 21) and neural network (the value of s depends on the dataset). All experiments are performed without any tuning of the algorithms, (e.g., with the same learning rate for all algorithms and without reducing it after a certain number of epochs). In these experiments, we provide results on the log of the excess loss F(wk) F , averaged on 5 runs (resp. 2) in convex settings (resp. deep learning)... w8a (b = 12).