Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis
Authors: Ziyi Chen, Yi Zhou, Rong-Rong Chen, Shaofeng Zou
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments demonstrate that the proposed algorithms achieve lower sample and communication complexities than the existing decentralized AC algorithms. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, University of Utah. 2Department of Electrical Engineering, University at Buffalo. |
| Pseudocode | Yes | Algorithm 1 Decentralized Actor-Critic; Algorithm 2 Decentralized TD (critic update); Algorithm 3 Decentralized Natural Actor-Critic |
| Open Source Code | No | The paper does not include an unambiguous statement or link indicating the release of source code for the methodology described. |
| Open Datasets | No | The paper describes experiments in simulated environments (e.g., "decentralized ring network", "fully connected network", "two-agent Cliff Navigation environment") rather than using a publicly available dataset with a specific link or citation. |
| Dataset Splits | No | The paper describes experiments in simulated environments and evaluates performance over iterations, but it does not specify explicit train/validation/test dataset splits typical for supervised learning tasks. |
| Hardware Specification | No | The paper describes the simulation setup and hyperparameters but does not provide any specific details about the hardware (e.g., GPU/CPU models) used to run the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation. |
| Experiment Setup | Yes | For our Algorithm 1, we choose T = 500, Tc = 50, T c = 10, Nc = 10, T = Tz = 5, β = 0.5, {σm}6 m=1 = 0.1, and consider batch size choices N = 100, 500, 2000. Algorithm 3 uses the same hyperparameters as those of Algorithm 1 except that T = 2000 in Algorithm 3. We select α = 10, 50, 200 for Algorithm 1 with N = 100, 500, 2000 respectively, and Tz = 5, α = 0.1, 0.5, 2, η = 0.04, 0.2, 0.8, K = 50, 100, 200, Nk 2, 5, 10 for Algorithm 3 with N = 100, 500, 2000, respectively. |