Adaptive Random Walk Gradient Descent for Decentralized Optimization

Authors: Tao Sun, Dongsheng Li, Bao Wang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experimental Results We contrast the performance of adaptive and non-adaptive random walk algorithms for training machine learning models, including logistic regression (LR), multi-layer perceptron (MLP), and convolutional neural networks (CNNs). We evaluate the performance of the models on the benchmark MNIST and CIFAR10 image classification tasks, where MNIST/CIFAR10 contains 60K/50K and 10K/10K images for training and test, respectively.
Researcher Affiliation Academia 1College of Computer, National University of Defense Technology, Hunan, China. 2Department of Mathematics and Scientific Computing and Imaging Institute, University of Utah.
Pseudocode Yes Algorithm 1 Adaptive Random Walk Gradient Descent
Open Source Code No The paper does not provide any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets Yes We evaluate the performance of the models on the benchmark MNIST and CIFAR10 image classification tasks, where MNIST/CIFAR10 contains 60K/50K and 10K/10K images for training and test, respectively.
Dataset Splits No The paper states 'MNIST/CIFAR10 contains 60K/50K and 10K/10K images for training and test, respectively' and 'We randomly partition the training data into ten even groups in an i.i.d. fashion', but it does not specify details about a validation split.
Hardware Specification No The paper describes the experimental setup in terms of models, datasets, and hyperparameters, but it does not provide specific hardware details such as GPU/CPU models or memory used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation.
Experiment Setup Yes In training, we set the batch size to be 128. We fine-tune the step size for both adaptive and non-adaptive random walk gradient descent, and we use the initial learning rate of 0.003 and 0.1 for adaptive and non-adaptive algorithms, respectively. The momentum hyperparameter is set to 0.9 for both solvers. Moreover, we set the weight decay for both adaptive and non-adaptive algorithms to be 5 × 10−4.