Permutation Compressors for Provably Faster Distributed Nonconvex Optimization

Authors: Rafał Szlendak, Alexander Tyurin, Peter Richtárik

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We corroborate our theoretical results with carefully engineered synthetic experiments with minimizing the average of nonconvex quadratics, and on autoencoder training with the MNIST dataset.
Researcher Affiliation Academia Rafał Szlendak KAUST Saudi Arabia Alexander Tyurin KAUST Saudi Arabia Peter Richtárik KAUST Saudi Arabia
Pseudocode Yes Algorithm 1 MARINA
Open Source Code No The paper discusses implementation details (Appendix J) but does not provide a specific repository link or an explicit statement about releasing the source code for their methodology.
Open Datasets Yes training with the MNIST dataset. (Le Cun et al., 2010)
Dataset Splits Yes Initially, we randomly split MNIST into n + 1 parts: D0, D1, , Dn, where n = 1000 is the number of nodes. Then, for all i {1, . . . , n}, the ith node takes split D0 with probability bp, or split Di with probability 1 bp. We define the chosen split as c Di.
Hardware Specification Yes All methods are implemented in Python 3.6 and run on a machine with 24 Intel(R) Xeon(R) Gold 6146 CPU @ 3.20GHz cores with 32-bit precision.
Software Dependencies No All methods are implemented in Python 3.6.
Experiment Setup Yes We take MARINA s and EF21 s parameters prescribed by the theory and performed a grid search for the step sizes for each compressor by multiplying the theoretical ones with powers of two. We fix λ = 1e-6, and dimension d = 1000 (see Figure 1). We then generated optimization tasks with the number of nodes n {10, 1000, 10000} and L {0, 0.05, 0.1, 0.21, 0.91}.