Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DASHA: Distributed Nonconvex Optimization with Communication Compression and Optimal Oracle Complexity

Authors: Alexander Tyurin, Peter Richtárik

ICLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, our theory is corroborated in practice: we see a significant improvement in experiments with nonconvex classification and training of deep learning models. and A Experiments We have tested all developed algorithms on practical machine learnings problems
Researcher Affiliation Academia Alexander Tyurin KAUST Saudi Arabia EMAIL Peter Richtárik KAUST Saudi Arabia EMAIL
Pseudocode Yes Algorithm 1 DASHA and Algorithm 2 DASHA-SYNC-MVR
Open Source Code Yes Code: https://github.com/mysteryresearcher/dasha
Open Datasets Yes We take the mushrooms dataset (dimension d = 112, number of samples equals 8124) from LIBSVM (Chang & Lin, 2011) and CIFAR10 (Krizhevsky et al., 2009)
Dataset Splits No The paper mentions splitting data (e.g., 'randomly split the dataset between 5 nodes') and using specific datasets, but does not provide explicit train/test/validation split percentages, absolute counts, or references to predefined splits for reproducibility.
Hardware Specification Yes A distributed environment was emulated on a machine with Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz and 64 cores. Deep learning experiments were conducted with NVIDIA A100 GPU with 40GB memory (each deep learning experiment uses at most 5GB of this memory).
Software Dependencies Yes The code was written in Python 3.6.8 using Py Torch 1.9 (Paszke et al., 2019).
Experiment Setup Yes In all experiments, we take parameters of algorithms predicted by the theory (stated in the convergence rate theorems our paper and in (Gorbunov et al., 2021)), except for the step sizes we fine-tune them using a set of powers of two {2i | i [ 10, 10]} and use the Rand K compressor.