reproducibilityindex.ai

Ordered Momentum for Asynchronous SGD

Authors: Chang-Wei Shi, Yi-Rui Yang, Wu-Jun Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results demonstrate that Or Mo can achieve better convergence performance compared with ASGD and other asynchronous methods with momentum.
Researcher Affiliation	Academia	National Key Laboratory for Novel Software Technology, School of Computer Science, Nanjing University, Nanjing, China
Pseudocode	Yes	Algorithm 1 Distributed SGD, Algorithm 2 Or Mo, Algorithm 3 SSGDm, Algorithm 4 naive ASGDm
Open Source Code	Yes	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We provide the code. The dataset is public.
Open Datasets	Yes	We evaluate these methods by training Res Net20 model [11] on CIFAR10 and CIFAR100 datasets [14].
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits or refer to a validation set specifically with quantitative details.
Hardware Specification	Yes	The experiments for the CIFAR10 dataset are conducted on NVIDIA RTX 2080 Ti GPUs. ... The experiments for the CIFAR100 dataset are conducted on NVIDIA V100 GPUs.
Software Dependencies	Yes	All the methods are implemented with Py Torch 1.3.
Experiment Setup	Yes	The number of workers is set to 16 and 64. The batch size on each worker is set to 64. The momentum coefficient is set to 0.9. Each experiment is repeated 5 times. ... For the CIFAR10 dataset, the weight decay is set to 0.0001 and the model is trained with 160 epochs. The learning rate is multiplied by 0.1 at the 80-th and 120-th epoch... For the CIFAR100 dataset, the weight decay is set to 0.0005 and the model is trained with 200 epochs. The learning rate is multiplied by 0.2 at the 60-th, 120-th and 160-th epoch...