Faster Adaptive Decentralized Learning Algorithms

Authors: Feihu Huang, Jianyu Zhao

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct some numerical experiments on training nonconvex machine learning tasks to verify the efficiency of our proposed algorithms.
Researcher Affiliation Academia 1College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China 2MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing, China.
Pseudocode Yes Algorithm 1 Adaptive Momentum-Based Decentralized Optimization (Ada MDOS) Algorithm for Stochastic Optimization Algorithm 2 Adaptive Momentum-Based Decentralized Optimization (Ada MDOF) Algorithm for Finite-Sum Optimization
Open Source Code No The paper does not provide concrete access to source code, such as a specific repository link, explicit code release statement, or code in supplementary materials.
Open Datasets Yes We use public w8a and covertype datasets1. 1available at https://www.openml.org/ The MNIST dataset (Le Cun et al., 2010) The Tiny-Image Net dataset (Le & Yang, 2015)
Dataset Splits No The paper specifies training and testing examples/splits for some datasets (e.g., MNIST, Tiny-Image Net) but does not provide explicit details about a validation dataset split or cross-validation setup.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. It only mentions a "decentralized network" and "clients" or "nodes" without further hardware specifications.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup Yes In the experiment, we set the regularization parameter λ = 10 5, and use the same initial solution x0 = xi 0 = 0.01 ones(d, 1) for all i [m] for all algorithms. In the experiment, for fair comparison, we use the batch size b = 10 in all algorithms, and set β1 = β2 = 0.9 in the DADAM (Nazari et al., 2022) and DAMSGrad (Chen et al., 2023), and set β1 = 0.9 in the DAda Grad (Chen et al., 2023), and set ϱ = βt = ηt = 0.9 for all t 1 in our algorithms.