Finite-Time Convergence Rates of Decentralized Local Markovian Stochastic Approximation
Authors: Pengfei Wang, Nenggan Zheng
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our theoretical results to multi-task reinforcement learning over multi-agent systems. In particular, we present for the first time decentralized federated variants of the Q-learning algorithm and establish its finite-time convergence rates. Performing multiple local updates on agents before exchanging information via sparse communication network is adopted in our algorithms to improve communication efficiency. We prove that when the solution accuracy ϵ is small enough, the sample complexity and communication complexity of our algorithms are e O(1/ϵ) and e O(1/ ϵ), respectively. Experiments on standard multi-task reinforcement learning tasks are provided to illustrate our theoretical results. |
| Researcher Affiliation | Academia | Pengfei Wang1,2,3 , Nenggan Zheng2,3,4,5 1School of Software Technology, Zhejiang University, Ningbo, China 2Qiushi Academy for Advanced Studies, Zhejiang University, Hangzhou, China 3College of Computer Science and Technology, Zhejiang University, Hangzhou, China 4State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou, China 5CCAI by MOE and Zhejiang Provincial Government (ZJU), Hangzhou, China pfei@zju.edu.cn, zng@cs.zju.edu.cn |
| Pseudocode | Yes | Algorithm 1 DLMSA... Algorithm 2 C-DLMSA |
| Open Source Code | No | The paper does not provide any explicit statement or link to open-source code for the methodology described. |
| Open Datasets | Yes | In this section, we aim to support our theoretical results by applying (compressed) decentralized federated Q-learning to solve the multi-task Grid World problem [Zeng et al., 2021a]. |
| Dataset Splits | No | The paper mentions training agents for 1000 episodes and evaluating the learned policy, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments (e.g., CPU/GPU models, memory specifications). |
| Software Dependencies | No | The paper does not mention any specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | We use the same best-tuned learning rate α = 0.5, communication interval K = 10, and discounter factor γ = 0.99 in all experiments. For Comp. DFed. Q-learning, we use top1% [Stich, 2018] as the compression operator. |