Asynchronous Distributed Learning : Adapting to Gradient Delays without Prior Knowledge
Authors: Rotem Zamir Aviv, Ido Hakimi, Assaf Schuster, Kfir Yehuda Levy
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the performance of our algorithm on real-world data. The experiments demonstrate our algorithm robustness and adaptivity. Concretely, when tuning SGD to train a given model for a speciļ¬c delay regime and then changing the delay regime, its performance might degrade. Conversely, our algorithm maintains its high performance. |
| Researcher Affiliation | Academia | 1Viterby Faculty of Electrical and Computer Engineering, Technion, Haifa, Israel 2Taub Faculty of Computer Science, Technion, Haifa, Israel 3A Viterbi Fellow. Correspondence to: Rotem Zamir Aviv <ratume@campus.technion.ac.il>. |
| Pseudocode | Yes | Algorithm 1 Delay Adaptive Anytime Online to Batch |
| Open Source Code | No | The paper does not include an unambiguous statement about releasing the source code for the work described, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | We trained anytime SGD on Fashion MNIST (Xiao et al., 2017) dataset with logistic regression model and evaluated it using multi-class log loss. Fashion-MNIST consists of 6 104 training examples and 1 104 test examples, when each example is a 28x28 grayscale clothing image, associated with a label from 10 categories. |
| Dataset Splits | No | Fashion-MNIST consists of 6 104 training examples and 1 104 test examples, when each example is a 28x28 grayscale clothing image, associated with a label from 10 categories. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used to run its experiments, such as GPU or CPU models. It mentions a simulated distributed system but no hardware specifications. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., library names with versions) needed to replicate the experiment. |
| Experiment Setup | No | The paper mentions tuning the learning rate and evaluating with different delay regimes and learning rates. However, it does not explicitly provide concrete hyperparameter values such as the specific learning rate used for the main comparison, batch size, number of epochs, or optimizer settings in the main text. |