ASAP.SGD: Instance-based Adaptiveness to Staleness in Asynchronous SGD

Authors: Karl Bäckström, Marina Papatriantafilou, Philippas Tsigas

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We complement the analysis with benchmarking the TAIL-τ function, for implementations of Async SGD representative of a variety of system-execution properties relating with scheduling and ordering. We evaluate TAIL-τ, comparing to standard constant step size executions, for relevant DL benchmark applications, namely training the Le Net (Le Cun et al., 1998) architecture, as well as a 3-layer MLP, for image recognition for image recognition on both MNIST and Fashion MNIST. The evaluation focuses on convergence rates, primarily wall-clock time to ϵ-convergence (which is the most relevant in practice), as well as number of successful executions, for various precision levels ϵ.
Researcher Affiliation Academia 1Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden.
Pseudocode Yes Algorithm 1 Staleness-adaptive shared-memory Async SGD GLOBAL loss function L, iteration counter t, max. n.o. iterations T, shared state θ, step size function η(τ)
Open Source Code Yes The implementation extends the open Shared-Memory-SGD (B ackstr om, 2021) C++ library, connecting ANN operations to low-level implementations of parallel SGD, and is free to use for research purposes. B ackstr om, K. shared-memory-sgd. https://github. com/dcs-chalmers/shared-memory-sgd, 2021.
Open Datasets Yes We tackle the problem of ANN training for image classification on the datasets MNIST (Le Cun & Cortes, 2010) of hand-written digits, CIFAR-10 (Krizhevsky et al., 2009) of everyday objects, and Fashion MNIST (Xiao et al., 2017) of clothing article images.
Dataset Splits No The paper mentions using MNIST, Fashion-MNIST, and CIFAR-10 datasets and specifies training for 100 epochs with a mini-batch size of 128, but does not explicitly state the train/validation/test dataset splits used for reproduction.
Hardware Specification Yes The experiments are conducted on a 2.10 GHz Intel(R) Xeon(R) E5-2695 two-socket 36-core (18 cores per socket, each supporting two hyper-threads), 64GB non-uniform memory access (NUMA), Ubuntu 16.04 system.
Software Dependencies No The paper mentions that the implementation extends a C++ library and that experiments were conducted on an 'Ubuntu 16.04 system', but it does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes For MNIST and Fashion-MNIST training we use a base step size of η0 = 1e 4 and mini-batch size 128, while for CIFAR10 we use η0 = 5e 3 and a mini-batch size of 16. The multi-class cross-entropy loss function is used in all experiments. For Leashed-SGD, we use the default setting of an infinite persistence bound. We use a TAIL-τ step size function (as in Definition 4.5), that adapts to each unique execution, based on the measured staleness distribution, with an adaptation amplitude of A = 1, due to its role in emphasizing fresh updates and dampening stragglers. The experiments are conducted on a 2.10 GHz Intel(R) Xeon(R) E5-2695 two-socket 36-core (18 cores per socket, each supporting two hyper-threads), 64GB non-uniform memory access (NUMA), Ubuntu 16.04 system.