Moniqua: Modulo Quantized Communication in Decentralized SGD

Authors: Yucheng Lu, Christopher De Sa

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate empirically that Moniqua converges faster with respect to wall clock time than other quantized decentralized algorithms. We also show that Moniqua is robust to very low bit-budgets, allowing 1-bit-per-parameter communication without compromising validation accuracy when training Res Net20 and Res Net110 on CIFAR10.6. Experiments In this section, we evaluate Moniqua empirically. First, we compare Moniqua and other quantized decentralized training algorithms convergence under different network configurations. Second, we compare the validation performance of them under extreme bit-budget. Then we investigate Moniqua s scalability on D2 and AD-PSGD. Finally, we introduce several useful techniques for running Moniqua efficiently.
Researcher Affiliation Academia Yucheng Lu 1 Christopher De Sa 1 1Department of Computer Science, Cornell University, Ithaca, New York, United States.
Pseudocode Yes Algorithm 1 Pseudo-code of Moniqua on worker i
Open Source Code No The paper does not explicitly state that source code for Moniqua is released, provide a repository link, or mention its inclusion in supplementary materials.
Open Datasets Yes We launch 8 workers connected in a ring topology and train a Res Net20 (He et al., 2016) model on CIFAR10 (Krizhevsky et al., 2014).
Dataset Splits No The paper mentions 'validation accuracy' and 'final test accuracy' but does not provide specific details on how the dataset was split into training, validation, and test sets (e.g., percentages or sample counts).
Hardware Specification Yes We launch one instance as one worker in previous formulation, each configured with a 2-core CPU with 4 GB memory and an NVIDIA Tesla P100 GPU.
Software Dependencies No All the models and training scripts in this section are implemented in Py Torch and run on Google Cloud Platform. We use MPICH as the communication backend. All the instances are running Ubuntu 16.04... The paper mentions software but does not specify version numbers for PyTorch or MPICH.
Experiment Setup Yes In the experiment, we adopt the following hyperparameters for Moniqua: {Momentum = 0.9, Weight Decay = 5e 4, Batch Size = 128, Initial Step Size = 0.1, θk = 2.0}. In the extreme-bitbudget experiment, we further use adopt the average ratio {γ = 5e 3}.