DASHA: Distributed Nonconvex Optimization with Communication Compression and Optimal Oracle Complexity

Authors: Alexander Tyurin, Peter Richtárik

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, our theory is corroborated in practice: we see a significant improvement in experiments with nonconvex classification and training of deep learning models. and A Experiments We have tested all developed algorithms on practical machine learnings problems
Researcher Affiliation Academia Alexander Tyurin KAUST Saudi Arabia alexandertiurin@gmail.com Peter Richtárik KAUST Saudi Arabia richtarik@gmail.com
Pseudocode Yes Algorithm 1 DASHA and Algorithm 2 DASHA-SYNC-MVR
Open Source Code Yes Code: https://github.com/mysteryresearcher/dasha
Open Datasets Yes We take the mushrooms dataset (dimension d = 112, number of samples equals 8124) from LIBSVM (Chang & Lin, 2011) and CIFAR10 (Krizhevsky et al., 2009)
Dataset Splits No The paper mentions splitting data (e.g., 'randomly split the dataset between 5 nodes') and using specific datasets, but does not provide explicit train/test/validation split percentages, absolute counts, or references to predefined splits for reproducibility.
Hardware Specification Yes A distributed environment was emulated on a machine with Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz and 64 cores. Deep learning experiments were conducted with NVIDIA A100 GPU with 40GB memory (each deep learning experiment uses at most 5GB of this memory).
Software Dependencies Yes The code was written in Python 3.6.8 using Py Torch 1.9 (Paszke et al., 2019).
Experiment Setup Yes In all experiments, we take parameters of algorithms predicted by the theory (stated in the convergence rate theorems our paper and in (Gorbunov et al., 2021)), except for the step sizes we fine-tune them using a set of powers of two {2i | i [ 10, 10]} and use the Rand K compressor.