MARINA: Faster Non-Convex Distributed Learning with Compression

Authors: Eduard Gorbunov, Konstantin P. Burlachenko, Zhize Li, Peter Richtarik

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct several numerical experiments to justify the theoretical claims of the paper.
Researcher Affiliation Collaboration 1Moscow Institute of Physics and Technology, Moscow, Russia 2Yandex, Moscow, Russia 3King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.
Pseudocode Yes Algorithm 1 MARINA
Open Source Code Yes Our code is available at https://github.com/burlachenkok/marina.
Open Datasets Yes binary classification problem involving non-convex loss (11) with Lib SVM data (Chang & Lin, 2011)... training Res Net-18 (He et al., 2016) at CIFAR100 (Krizhevsky et al., 2009) dataset.
Dataset Splits No The paper mentions using specific datasets but does not provide explicit train/validation/test splits, percentages, or sample counts for reproducibility. Standard splits might be implied for well-known datasets, but they are not stated.
Hardware Specification No The paper states 'The distributed environment is simulated' but does not provide any specific hardware details such as GPU/CPU models or memory specifications used for these simulations.
Software Dependencies Yes The distributed environment is simulated in Python 3.8 using MPI4PY and other standard libraries... The code is wrtitten in Python 3.9 using Py Torch 1.7
Experiment Setup Yes Stepsizes for the methods are chosen according to the theory and the batchsizes for VR-MARINA and VR-DIANA are m/100... In all cases, we used the Rand K sparsification operator with K {1, 5, 10}... Number of workers equals 5. Stepsizes for the methods were tuned and the batchsizes are m/50.