Adaptive Random Walk Gradient Descent for Decentralized Optimization
Authors: Tao Sun, Dongsheng Li, Bao Wang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experimental Results We contrast the performance of adaptive and non-adaptive random walk algorithms for training machine learning models, including logistic regression (LR), multi-layer perceptron (MLP), and convolutional neural networks (CNNs). We evaluate the performance of the models on the benchmark MNIST and CIFAR10 image classification tasks, where MNIST/CIFAR10 contains 60K/50K and 10K/10K images for training and test, respectively. |
| Researcher Affiliation | Academia | 1College of Computer, National University of Defense Technology, Hunan, China. 2Department of Mathematics and Scientific Computing and Imaging Institute, University of Utah. |
| Pseudocode | Yes | Algorithm 1 Adaptive Random Walk Gradient Descent |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that source code for the described methodology is publicly available. |
| Open Datasets | Yes | We evaluate the performance of the models on the benchmark MNIST and CIFAR10 image classification tasks, where MNIST/CIFAR10 contains 60K/50K and 10K/10K images for training and test, respectively. |
| Dataset Splits | No | The paper states 'MNIST/CIFAR10 contains 60K/50K and 10K/10K images for training and test, respectively' and 'We randomly partition the training data into ten even groups in an i.i.d. fashion', but it does not specify details about a validation split. |
| Hardware Specification | No | The paper describes the experimental setup in terms of models, datasets, and hyperparameters, but it does not provide specific hardware details such as GPU/CPU models or memory used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation. |
| Experiment Setup | Yes | In training, we set the batch size to be 128. We fine-tune the step size for both adaptive and non-adaptive random walk gradient descent, and we use the initial learning rate of 0.003 and 0.1 for adaptive and non-adaptive algorithms, respectively. The momentum hyperparameter is set to 0.9 for both solvers. Moreover, we set the weight decay for both adaptive and non-adaptive algorithms to be 5 × 10−4. |