Ordered Momentum for Asynchronous SGD
Authors: Chang-Wei Shi, Yi-Rui Yang, Wu-Jun Li
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate that Or Mo can achieve better convergence performance compared with ASGD and other asynchronous methods with momentum. |
| Researcher Affiliation | Academia | National Key Laboratory for Novel Software Technology, School of Computer Science, Nanjing University, Nanjing, China |
| Pseudocode | Yes | Algorithm 1 Distributed SGD, Algorithm 2 Or Mo, Algorithm 3 SSGDm, Algorithm 4 naive ASGDm |
| Open Source Code | Yes | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We provide the code. The dataset is public. |
| Open Datasets | Yes | We evaluate these methods by training Res Net20 model [11] on CIFAR10 and CIFAR100 datasets [14]. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits or refer to a validation set specifically with quantitative details. |
| Hardware Specification | Yes | The experiments for the CIFAR10 dataset are conducted on NVIDIA RTX 2080 Ti GPUs. ... The experiments for the CIFAR100 dataset are conducted on NVIDIA V100 GPUs. |
| Software Dependencies | Yes | All the methods are implemented with Py Torch 1.3. |
| Experiment Setup | Yes | The number of workers is set to 16 and 64. The batch size on each worker is set to 64. The momentum coefficient is set to 0.9. Each experiment is repeated 5 times. ... For the CIFAR10 dataset, the weight decay is set to 0.0001 and the model is trained with 160 epochs. The learning rate is multiplied by 0.1 at the 80-th and 120-th epoch... For the CIFAR100 dataset, the weight decay is set to 0.0005 and the model is trained with 200 epochs. The learning rate is multiplied by 0.2 at the 60-th, 120-th and 160-th epoch... |