Momentum Provably Improves Error Feedback!
Authors: Ilyas Fatkhullin, Alexander Tyurin, Peter Richtarik
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose a surprisingly simple fix which removes this issue both theoretically, and in practice: the application of Polyak s momentum to the latest incarnation of EF due to Richtárik et al. [2021] known as EF21. Our algorithm, for which we coin the name EF21-SGDM, improves the communication and sample complexities of previous error feedback algorithms under standard smoothness and bounded variance assumptions, and does not require any further strong assumptions such as bounded gradient dissimilarity. Moreover, we propose a double momentum version of our method that improves the complexities even further. Our proof seems to be novel even when compression is removed from the method, and as such, our proof technique is of independent interest in the study of nonconvex stochastic optimization enriched with Polyak s momentum. (...) 4 Experiments. We consider a nonconvex logistic regression problem: f(x) = 1/m SUM(log(exp(a_ij*x*y_ij)/SUM(exp(a_ij*x_y)))) with a nonconvex regularizer h(x) = lambda SUM([x_y]_k^2/(1 + [x_y]_k^2)) with lambda = 10^-3, where x_1, ..., x_c in R^l, [.]_k is an indexing operation of a vector, c >= 2 is the number of classes, l is the number of features, m is the size of a dataset, a_ij in R^l and y_ij in {1, ..., c} are features and labels. The datasets used are MNIST (with l = 784, m = 60 000, c = 10) and real-sim (with l = 20 958, m = 72 309, c = 2) [Le Cun et al., 2010, Chang and Lin, 2011]. |
| Researcher Affiliation | Academia | Ilyas Fatkhullin ETH AI Center & ETH Zurich Alexander Tyurin KAUST* Peter Richtárik KAUST *King Abdullah University of Science and Technology, Thuwal, Saudi Arabia. |
| Pseudocode | Yes | Algorithm 1 EF21-SGDM (Error Feedback 2021 Enhanced with Polyak Momentum). (...) Algorithm 2 Quadratic Optimization Task Generation Procedure. (...) Algorithm 3 EF21-SGD2M (Error Feedback 2021 Enhanced with Double Momentum). (...) Algorithm 4 EF21-SGDM (abs). (...) Algorithm 5 EF21-STORM/MVR. |
| Open Source Code | No | No explicit statement or link providing concrete access to the source code for the methodology described in this paper was found. |
| Open Datasets | Yes | The datasets used are MNIST (with l = 784, m = 60 000, c = 10) and real-sim (with l = 20 958, m = 72 309, c = 2) [Le Cun et al., 2010, Chang and Lin, 2011]. (...) We test algorithms on an image recognition task, CIFAR10 [Krizhevsky et al., 2009], with the ResNet18 [He et al., 2016] deep neural network (the number of parameters d ~ 10^7). |
| Dataset Splits | No | No specific training/validation/test split percentages or sample counts are provided for MNIST or real-sim. For CIFAR10, it mentions 'We split CIFAR10 among 5 nodes' but does not specify the train/validation/test splits. |
| Hardware Specification | Yes | The distributed environment was emulated on machines with Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz. |
| Software Dependencies | No | The experiments were implemented in Python 3.7.9. This only specifies the programming language and its version, without listing any specific libraries or solvers with their version numbers to ensure full reproducibility. |
| Experiment Setup | Yes | In all algorithms, the step sizes are fine-tuned from a set {2^k | k in [-20, 20]} and the TopK compressor is used to compress information from the nodes to the master. For EF21-SGDM , we fix momentum parameter eta = 0.1 in all experiments. Prior to that, we tuned eta in {0.01, 0.1} on the independent dataset w8a (with l = 300, m = 49 749, c = 2). |