Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning
Authors: Yulai Zhao, Zhuoran Yang, Zhaoran Wang, Jason D. Lee
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulation To further validate the theoretical and experimental benefits of our algorithm, we conducted a numerical simulation to showcase the superiority of our algorithm with sequential updates structure over naive independent policy gradient updates. and Results We test our algorithm with sequential gradient updates and the independent learning method in different settings. The results are shown in Figure 1. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, Princeton University, NJ, USA 2Department of Statistics and Data Science, Yale University, CT, USA 3Department of Industrial Engineering and Management Sciences, Northwestern University, IL, USA. |
| Pseudocode | Yes | Algorithm 1 Multi-Agent PPO and Algorithm 2 Pessimistic Multi-Agent PPO with Linear Function Approximation |
| Open Source Code | Yes | 5Implementation can be found at https://github.com/zhaoyl18/ratio_game. |
| Open Datasets | No | We consider von Neumann s ratio game, a simple stochastic game also used by Daskalakis et al. (2020). |
| Dataset Splits | No | We test our algorithm with sequential gradient updates and the independent learning method in different settings. |
| Hardware Specification | No | The paper does not provide specific hardware details for running its experiments. |
| Software Dependencies | No | 5Implementation can be found at https://github.com/zhaoyl18/ratio_game. |
| Experiment Setup | Yes | In this example, a big stepsize would help alleviate the issue (e.g., in (c), independent PG escapes the stationary point after 3000 iterations). |