Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
DASHA: Distributed Nonconvex Optimization with Communication Compression and Optimal Oracle Complexity
Authors: Alexander Tyurin, Peter Richtárik
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, our theory is corroborated in practice: we see a significant improvement in experiments with nonconvex classification and training of deep learning models. and A Experiments We have tested all developed algorithms on practical machine learnings problems |
| Researcher Affiliation | Academia | Alexander Tyurin KAUST Saudi Arabia EMAIL Peter Richtárik KAUST Saudi Arabia EMAIL |
| Pseudocode | Yes | Algorithm 1 DASHA and Algorithm 2 DASHA-SYNC-MVR |
| Open Source Code | Yes | Code: https://github.com/mysteryresearcher/dasha |
| Open Datasets | Yes | We take the mushrooms dataset (dimension d = 112, number of samples equals 8124) from LIBSVM (Chang & Lin, 2011) and CIFAR10 (Krizhevsky et al., 2009) |
| Dataset Splits | No | The paper mentions splitting data (e.g., 'randomly split the dataset between 5 nodes') and using specific datasets, but does not provide explicit train/test/validation split percentages, absolute counts, or references to predefined splits for reproducibility. |
| Hardware Specification | Yes | A distributed environment was emulated on a machine with Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz and 64 cores. Deep learning experiments were conducted with NVIDIA A100 GPU with 40GB memory (each deep learning experiment uses at most 5GB of this memory). |
| Software Dependencies | Yes | The code was written in Python 3.6.8 using Py Torch 1.9 (Paszke et al., 2019). |
| Experiment Setup | Yes | In all experiments, we take parameters of algorithms predicted by the theory (stated in the convergence rate theorems our paper and in (Gorbunov et al., 2021)), except for the step sizes we fine-tune them using a set of powers of two {2i | i [ 10, 10]} and use the Rand K compressor. |