reproducibilityindex.ai

Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning

Authors: Yulai Zhao, Zhuoran Yang, Zhaoran Wang, Jason D. Lee

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulation To further validate the theoretical and experimental benefits of our algorithm, we conducted a numerical simulation to showcase the superiority of our algorithm with sequential updates structure over naive independent policy gradient updates. and Results We test our algorithm with sequential gradient updates and the independent learning method in different settings. The results are shown in Figure 1.
Researcher Affiliation	Academia	1Department of Electrical and Computer Engineering, Princeton University, NJ, USA 2Department of Statistics and Data Science, Yale University, CT, USA 3Department of Industrial Engineering and Management Sciences, Northwestern University, IL, USA.
Pseudocode	Yes	Algorithm 1 Multi-Agent PPO and Algorithm 2 Pessimistic Multi-Agent PPO with Linear Function Approximation
Open Source Code	Yes	5Implementation can be found at https://github.com/zhaoyl18/ratio_game.
Open Datasets	No	We consider von Neumann s ratio game, a simple stochastic game also used by Daskalakis et al. (2020).
Dataset Splits	No	We test our algorithm with sequential gradient updates and the independent learning method in different settings.
Hardware Specification	No	The paper does not provide specific hardware details for running its experiments.
Software Dependencies	No	5Implementation can be found at https://github.com/zhaoyl18/ratio_game.
Experiment Setup	Yes	In this example, a big stepsize would help alleviate the issue (e.g., in (c), independent PG escapes the stationary point after 3000 iterations).