Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning

Authors: Yulai Zhao, Zhuoran Yang, Zhaoran Wang, Jason D. Lee

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulation To further validate the theoretical and experimental benefits of our algorithm, we conducted a numerical simulation to showcase the superiority of our algorithm with sequential updates structure over naive independent policy gradient updates. and Results We test our algorithm with sequential gradient updates and the independent learning method in different settings. The results are shown in Figure 1.
Researcher Affiliation Academia 1Department of Electrical and Computer Engineering, Princeton University, NJ, USA 2Department of Statistics and Data Science, Yale University, CT, USA 3Department of Industrial Engineering and Management Sciences, Northwestern University, IL, USA.
Pseudocode Yes Algorithm 1 Multi-Agent PPO and Algorithm 2 Pessimistic Multi-Agent PPO with Linear Function Approximation
Open Source Code Yes 5Implementation can be found at https://github.com/zhaoyl18/ratio_game.
Open Datasets No We consider von Neumann s ratio game, a simple stochastic game also used by Daskalakis et al. (2020).
Dataset Splits No We test our algorithm with sequential gradient updates and the independent learning method in different settings.
Hardware Specification No The paper does not provide specific hardware details for running its experiments.
Software Dependencies No 5Implementation can be found at https://github.com/zhaoyl18/ratio_game.
Experiment Setup Yes In this example, a big stepsize would help alleviate the issue (e.g., in (c), independent PG escapes the stationary point after 3000 iterations).