Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis

Authors: Ziyi Chen, Yi Zhou, Rong-Rong Chen, Shaofeng Zou

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments demonstrate that the proposed algorithms achieve lower sample and communication complexities than the existing decentralized AC algorithms.
Researcher Affiliation Academia 1Department of Electrical and Computer Engineering, University of Utah. 2Department of Electrical Engineering, University at Buffalo.
Pseudocode Yes Algorithm 1 Decentralized Actor-Critic; Algorithm 2 Decentralized TD (critic update); Algorithm 3 Decentralized Natural Actor-Critic
Open Source Code No The paper does not include an unambiguous statement or link indicating the release of source code for the methodology described.
Open Datasets No The paper describes experiments in simulated environments (e.g., "decentralized ring network", "fully connected network", "two-agent Cliff Navigation environment") rather than using a publicly available dataset with a specific link or citation.
Dataset Splits No The paper describes experiments in simulated environments and evaluates performance over iterations, but it does not specify explicit train/validation/test dataset splits typical for supervised learning tasks.
Hardware Specification No The paper describes the simulation setup and hyperparameters but does not provide any specific details about the hardware (e.g., GPU/CPU models) used to run the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation.
Experiment Setup Yes For our Algorithm 1, we choose T = 500, Tc = 50, T c = 10, Nc = 10, T = Tz = 5, β = 0.5, {σm}6 m=1 = 0.1, and consider batch size choices N = 100, 500, 2000. Algorithm 3 uses the same hyperparameters as those of Algorithm 1 except that T = 2000 in Algorithm 3. We select α = 10, 50, 200 for Algorithm 1 with N = 100, 500, 2000 respectively, and Tz = 5, α = 0.1, 0.5, 2, η = 0.04, 0.2, 0.8, K = 50, 100, 200, Nk 2, 5, 10 for Algorithm 3 with N = 100, 500, 2000, respectively.