Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus
Authors: Qiwen Cui, Simon S. Du
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper considers offline multi-agent reinforcement learning. We propose the strategy-wise concentration principle which directly builds a confidence interval for the joint strategy, in contrast to the point-wise concentration principle that builds a confidence interval for each point in the joint action space. For two-player zero-sum Markov games, by exploiting the convexity of the strategy-wise bonus, we propose a computationally efficient algorithm whose sample complexity enjoys a better dependency on the number of actions than the prior methods based on the point-wise bonus. Furthermore, for offline multi-agent general-sum Markov games, based on the strategy-wise bonus and a novel surrogate function, we give the first algorithm whose sample complexity only scales Pm i=1 Ai where Ai is the action size of the i-th player and m is the number of players. In sharp contrast, the sample complexity of methods based on the point-wise bonus would scale with the size of the joint action space m i=1Ai due to the curse of multiagents. Lastly, all of our algorithms can naturally take a pre-specified strategy class as input and output a strategy that is close to the best strategy in . In this setting, the sample complexity only scales with log | | instead of Pm |
| Researcher Affiliation | Academia | Qiwen Cui Paul G. Allen School of Computer Science Engineering University of Washington qwcui@cs.washington.edu Simon S. Du Paul G. Allen School of Computer Science Engineering University of Washington ssdu@cs.washington.edu |
| Pseudocode | Yes | Algorithm 1 Strategy-wise Bonus + Maxi Min Optimization (SBMM) |
| Open Source Code | No | The paper is theoretical and does not mention providing open-source code for the described methodology. The ethical checklist explicitly states 'N/A' for questions related to code and experiments. |
| Open Datasets | No | The paper is theoretical and defines a 'compliant' dataset structure but does not use or provide access information for a specific, publicly available dataset. The ethical checklist states 'N/A' for experimental details. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments, therefore, it does not specify training/test/validation dataset splits. The ethical checklist states 'N/A' for training details including data splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental hardware specifications. The ethical checklist explicitly states 'N/A' for questions related to compute resources. |
| Software Dependencies | No | The paper is theoretical and does not describe any specific software dependencies with version numbers for running experiments. The ethical checklist explicitly states 'N/A' for questions related to experiments. |
| Experiment Setup | No | The paper is theoretical and does not include details about an experimental setup, such as hyperparameters or system-level training settings. The ethical checklist explicitly states 'N/A' for training details. |