Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning
Authors: Yulai Zhao, Zhuoran Yang, Zhaoran Wang, Jason D. Lee
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulation To further validate the theoretical and experimental benefits of our algorithm, we conducted a numerical simulation to showcase the superiority of our algorithm with sequential updates structure over naive independent policy gradient updates. and Results We test our algorithm with sequential gradient updates and the independent learning method in different settings. The results are shown in Figure 1. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, Princeton University, NJ, USA 2Department of Statistics and Data Science, Yale University, CT, USA 3Department of Industrial Engineering and Management Sciences, Northwestern University, IL, USA. |
| Pseudocode | Yes | Algorithm 1 Multi-Agent PPO and Algorithm 2 Pessimistic Multi-Agent PPO with Linear Function Approximation |
| Open Source Code | Yes | 5Implementation can be found at https://github.com/zhaoyl18/ratio_game. |
| Open Datasets | No | We consider von Neumann s ratio game, a simple stochastic game also used by Daskalakis et al. (2020). |
| Dataset Splits | No | We test our algorithm with sequential gradient updates and the independent learning method in different settings. |
| Hardware Specification | No | The paper does not provide specific hardware details for running its experiments. |
| Software Dependencies | No | 5Implementation can be found at https://github.com/zhaoyl18/ratio_game. |
| Experiment Setup | Yes | In this example, a big stepsize would help alleviate the issue (e.g., in (c), independent PG escapes the stationary point after 3000 iterations). |