On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning

Authors: Weichao Mao, Lin Yang, Kaiqing Zhang, Tamer Basar

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we provide numerical simulations to corroborate our theoretical findings.
Researcher Affiliation Collaboration 1Department of Electrical and Computer Engineering & Coordinated Science Laboratory, University of Illinois Urbana Champaign. 2Department of Electrical and Computer Engineering, University of California, Los Angeles. Part of this work done while the author was visiting Deep Mind. 3Laboratory for Information & Decision Systems, Massachusetts Institute of Technology. Part of this work done while the author was visiting Simons Institute for the Theory of Computing.
Pseudocode Yes Algorithm 1: Stage-Based V-Learning for CCE (agent i)
Open Source Code No The paper does not include any explicit statements about making the source code open, nor does it provide a link to a code repository.
Open Datasets Yes We use a classic matrix team example from the literature (Claus & Boutilier, 1998; Lauer & Riedmiller, 2000)...
Dataset Splits No The paper describes episodic reinforcement learning settings with episodes and steps (e.g., 'K = 50000 episodes, each episode containing H = 10 steps') but does not specify fixed train/validation/test data splits as would be typical for a supervised learning setup.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory used for running its experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks like Python, PyTorch, or TensorFlow versions) that were used for the experiments.
Experiment Setup Yes We run Algorithm 3 on this task for T = 5000 rounds, and we set the step size ηt = 10 4 and the momentum parameter at = 0.5.