Faster Adaptive Decentralized Learning Algorithms
Authors: Feihu Huang, Jianyu Zhao
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct some numerical experiments on training nonconvex machine learning tasks to verify the efficiency of our proposed algorithms. |
| Researcher Affiliation | Academia | 1College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China 2MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing, China. |
| Pseudocode | Yes | Algorithm 1 Adaptive Momentum-Based Decentralized Optimization (Ada MDOS) Algorithm for Stochastic Optimization Algorithm 2 Adaptive Momentum-Based Decentralized Optimization (Ada MDOF) Algorithm for Finite-Sum Optimization |
| Open Source Code | No | The paper does not provide concrete access to source code, such as a specific repository link, explicit code release statement, or code in supplementary materials. |
| Open Datasets | Yes | We use public w8a and covertype datasets1. 1available at https://www.openml.org/ The MNIST dataset (Le Cun et al., 2010) The Tiny-Image Net dataset (Le & Yang, 2015) |
| Dataset Splits | No | The paper specifies training and testing examples/splits for some datasets (e.g., MNIST, Tiny-Image Net) but does not provide explicit details about a validation dataset split or cross-validation setup. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. It only mentions a "decentralized network" and "clients" or "nodes" without further hardware specifications. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | In the experiment, we set the regularization parameter λ = 10 5, and use the same initial solution x0 = xi 0 = 0.01 ones(d, 1) for all i [m] for all algorithms. In the experiment, for fair comparison, we use the batch size b = 10 in all algorithms, and set β1 = β2 = 0.9 in the DADAM (Nazari et al., 2022) and DAMSGrad (Chen et al., 2023), and set β1 = 0.9 in the DAda Grad (Chen et al., 2023), and set ϱ = βt = ηt = 0.9 for all t 1 in our algorithms. |