Ordered Momentum for Asynchronous SGD

Authors: Chang-Wei Shi, Yi-Rui Yang, Wu-Jun Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate that Or Mo can achieve better convergence performance compared with ASGD and other asynchronous methods with momentum.
Researcher Affiliation Academia National Key Laboratory for Novel Software Technology, School of Computer Science, Nanjing University, Nanjing, China
Pseudocode Yes Algorithm 1 Distributed SGD, Algorithm 2 Or Mo, Algorithm 3 SSGDm, Algorithm 4 naive ASGDm
Open Source Code Yes Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We provide the code. The dataset is public.
Open Datasets Yes We evaluate these methods by training Res Net20 model [11] on CIFAR10 and CIFAR100 datasets [14].
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits or refer to a validation set specifically with quantitative details.
Hardware Specification Yes The experiments for the CIFAR10 dataset are conducted on NVIDIA RTX 2080 Ti GPUs. ... The experiments for the CIFAR100 dataset are conducted on NVIDIA V100 GPUs.
Software Dependencies Yes All the methods are implemented with Py Torch 1.3.
Experiment Setup Yes The number of workers is set to 16 and 64. The batch size on each worker is set to 64. The momentum coefficient is set to 0.9. Each experiment is repeated 5 times. ... For the CIFAR10 dataset, the weight decay is set to 0.0001 and the model is trained with 160 epochs. The learning rate is multiplied by 0.1 at the 80-th and 120-th epoch... For the CIFAR100 dataset, the weight decay is set to 0.0005 and the model is trained with 200 epochs. The learning rate is multiplied by 0.2 at the 60-th, 120-th and 160-th epoch...