Finite-Time Convergence Rates of Decentralized Local Markovian Stochastic Approximation

Authors: Pengfei Wang, Nenggan Zheng

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply our theoretical results to multi-task reinforcement learning over multi-agent systems. In particular, we present for the first time decentralized federated variants of the Q-learning algorithm and establish its finite-time convergence rates. Performing multiple local updates on agents before exchanging information via sparse communication network is adopted in our algorithms to improve communication efficiency. We prove that when the solution accuracy ϵ is small enough, the sample complexity and communication complexity of our algorithms are e O(1/ϵ) and e O(1/ ϵ), respectively. Experiments on standard multi-task reinforcement learning tasks are provided to illustrate our theoretical results.
Researcher Affiliation Academia Pengfei Wang1,2,3 , Nenggan Zheng2,3,4,5 1School of Software Technology, Zhejiang University, Ningbo, China 2Qiushi Academy for Advanced Studies, Zhejiang University, Hangzhou, China 3College of Computer Science and Technology, Zhejiang University, Hangzhou, China 4State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou, China 5CCAI by MOE and Zhejiang Provincial Government (ZJU), Hangzhou, China pfei@zju.edu.cn, zng@cs.zju.edu.cn
Pseudocode Yes Algorithm 1 DLMSA... Algorithm 2 C-DLMSA
Open Source Code No The paper does not provide any explicit statement or link to open-source code for the methodology described.
Open Datasets Yes In this section, we aim to support our theoretical results by applying (compressed) decentralized federated Q-learning to solve the multi-task Grid World problem [Zeng et al., 2021a].
Dataset Splits No The paper mentions training agents for 1000 episodes and evaluating the learned policy, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments (e.g., CPU/GPU models, memory specifications).
Software Dependencies No The paper does not mention any specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes We use the same best-tuned learning rate α = 0.5, communication interval K = 10, and discounter factor γ = 0.99 in all experiments. For Comp. DFed. Q-learning, we use top1% [Stich, 2018] as the compression operator.