A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning
Authors: Gugan Chandrashekhar Thoppe, Bhumesh Kumar
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we derive a novel law of iterated logarithm for a family of distributed nonlinear stochastic approximation schemes that is useful in MARL. In particular, our result describes the convergence rate on almost every sample path where the algorithm converges. This result is the first of its kind in the distributed setup and provides deeper insights than the existing ones, which only discuss convergence rates in the expected or the CLT sense. Importantly, our result holds under significantly weaker assumptions: neither the gossip matrix needs to be doubly stochastic nor the stepsizes square summable. As an application, we show that, for the stepsize n γ with γ (0, 1), the distributed TD(0) algorithm with linear function approximation has a convergence rate of O(n γ ln n) a.s.; for the 1/n type stepsize, the same is O(n 1 ln ln n) a.s. These decay rates do not depend on the graph depicting the interactions among the different agents. |
| Researcher Affiliation | Academia | Gugan Thoppe Computer Science and Automation Indian Institute of Science Bengaluru, Karnataka 560012, India gthoppe@iisc.ac.in Bhumesh Kumar Electrical and Computer Engineering University of Wisconsin at Madison Madison, WI 53706, USA bkumar@wisc.edu |
| Pseudocode | No | The paper describes algorithms and their update rules using mathematical equations (e.g., equation (2) and (13)), but it does not present any structured pseudocode or an algorithm block labeled as such. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not use or reference any specific dataset for empirical evaluation. It refers to (sn, an, sn), n 0 as IID samples of (s, a, s), but this is a theoretical assumption for the analysis, not a dataset for experimentation. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical experiments with dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper focuses on theoretical analysis and mathematical proofs. There is no mention of specific hardware used because no empirical experiments were conducted. |
| Software Dependencies | No | The paper is theoretical and does not describe software dependencies with version numbers as it does not report on empirical experiments requiring specific software setups. |
| Experiment Setup | No | The paper is theoretical and focuses on mathematical derivations and analysis. It does not describe an experimental setup with hyperparameters or system-level training settings. |