reproducibilityindex.ai

Finite-Time Convergence Rates of Decentralized Local Markovian Stochastic Approximation

Authors: Pengfei Wang, Nenggan Zheng

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply our theoretical results to multi-task reinforcement learning over multi-agent systems. In particular, we present for the first time decentralized federated variants of the Q-learning algorithm and establish its finite-time convergence rates. Performing multiple local updates on agents before exchanging information via sparse communication network is adopted in our algorithms to improve communication efficiency. We prove that when the solution accuracy ϵ is small enough, the sample complexity and communication complexity of our algorithms are e O(1/ϵ) and e O(1/ ϵ), respectively. Experiments on standard multi-task reinforcement learning tasks are provided to illustrate our theoretical results.
Researcher Affiliation	Academia	Pengfei Wang1,2,3 , Nenggan Zheng2,3,4,5 1School of Software Technology, Zhejiang University, Ningbo, China 2Qiushi Academy for Advanced Studies, Zhejiang University, Hangzhou, China 3College of Computer Science and Technology, Zhejiang University, Hangzhou, China 4State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou, China 5CCAI by MOE and Zhejiang Provincial Government (ZJU), Hangzhou, China pfei@zju.edu.cn, zng@cs.zju.edu.cn
Pseudocode	Yes	Algorithm 1 DLMSA... Algorithm 2 C-DLMSA
Open Source Code	No	The paper does not provide any explicit statement or link to open-source code for the methodology described.
Open Datasets	Yes	In this section, we aim to support our theoretical results by applying (compressed) decentralized federated Q-learning to solve the multi-task Grid World problem [Zeng et al., 2021a].
Dataset Splits	No	The paper mentions training agents for 1000 episodes and evaluating the learned policy, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments (e.g., CPU/GPU models, memory specifications).
Software Dependencies	No	The paper does not mention any specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	We use the same best-tuned learning rate α = 0.5, communication interval K = 10, and discounter factor γ = 0.99 in all experiments. For Comp. DFed. Q-learning, we use top1% [Stich, 2018] as the compression operator.