ASAP.SGD: Instance-based Adaptiveness to Staleness in Asynchronous SGD
Authors: Karl Bäckström, Marina Papatriantafilou, Philippas Tsigas
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We complement the analysis with benchmarking the TAIL-τ function, for implementations of Async SGD representative of a variety of system-execution properties relating with scheduling and ordering. We evaluate TAIL-τ, comparing to standard constant step size executions, for relevant DL benchmark applications, namely training the Le Net (Le Cun et al., 1998) architecture, as well as a 3-layer MLP, for image recognition for image recognition on both MNIST and Fashion MNIST. The evaluation focuses on convergence rates, primarily wall-clock time to ϵ-convergence (which is the most relevant in practice), as well as number of successful executions, for various precision levels ϵ. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden. |
| Pseudocode | Yes | Algorithm 1 Staleness-adaptive shared-memory Async SGD GLOBAL loss function L, iteration counter t, max. n.o. iterations T, shared state θ, step size function η(τ) |
| Open Source Code | Yes | The implementation extends the open Shared-Memory-SGD (B ackstr om, 2021) C++ library, connecting ANN operations to low-level implementations of parallel SGD, and is free to use for research purposes. B ackstr om, K. shared-memory-sgd. https://github. com/dcs-chalmers/shared-memory-sgd, 2021. |
| Open Datasets | Yes | We tackle the problem of ANN training for image classification on the datasets MNIST (Le Cun & Cortes, 2010) of hand-written digits, CIFAR-10 (Krizhevsky et al., 2009) of everyday objects, and Fashion MNIST (Xiao et al., 2017) of clothing article images. |
| Dataset Splits | No | The paper mentions using MNIST, Fashion-MNIST, and CIFAR-10 datasets and specifies training for 100 epochs with a mini-batch size of 128, but does not explicitly state the train/validation/test dataset splits used for reproduction. |
| Hardware Specification | Yes | The experiments are conducted on a 2.10 GHz Intel(R) Xeon(R) E5-2695 two-socket 36-core (18 cores per socket, each supporting two hyper-threads), 64GB non-uniform memory access (NUMA), Ubuntu 16.04 system. |
| Software Dependencies | No | The paper mentions that the implementation extends a C++ library and that experiments were conducted on an 'Ubuntu 16.04 system', but it does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | For MNIST and Fashion-MNIST training we use a base step size of η0 = 1e 4 and mini-batch size 128, while for CIFAR10 we use η0 = 5e 3 and a mini-batch size of 16. The multi-class cross-entropy loss function is used in all experiments. For Leashed-SGD, we use the default setting of an infinite persistence bound. We use a TAIL-τ step size function (as in Definition 4.5), that adapts to each unique execution, based on the measured staleness distribution, with an adaptation amplitude of A = 1, due to its role in emphasizing fresh updates and dampening stragglers. The experiments are conducted on a 2.10 GHz Intel(R) Xeon(R) E5-2695 two-socket 36-core (18 cores per socket, each supporting two hyper-threads), 64GB non-uniform memory access (NUMA), Ubuntu 16.04 system. |