A Sharp Analysis of Model-based Reinforcement Learning with Self-Play
Authors: Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we present a sharp analysis of model-based self-play algorithms for multi-agent Markov games. We design an algorithm Optimistic Nash Value Iteration (Nash-VI) for two-player zero-sum Markov games that is able to output an ϵ-approximate Nash policy in O(H3SAB/ϵ2) episodes of game playing. |
| Researcher Affiliation | Collaboration | 1Princeton University, 2Massachusetts Institute of Technology, 3Salesforce Research. |
| Pseudocode | Yes | Algorithm 1 Optimistic Nash Value Iteration (Nash-VI); Algorithm 2 Optimistic Value Iteration with Zero Reward (VI-Zero) |
| Open Source Code | No | The paper does not provide any explicit statement about open-sourcing code or a link to a code repository for the described methodology. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm design and theoretical guarantees for Markov games defined by abstract parameters (S, A, B, H), rather than empirical evaluation on specific datasets. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical data splits for training, validation, or testing. |
| Hardware Specification | No | The paper does not mention any specific hardware used for computational work or experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical, describing algorithms and their guarantees, and does not provide details on experimental setup such as hyperparameters or training configurations for empirical runs. |