Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus

Authors: Qiwen Cui, Simon S. Du

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper considers offline multi-agent reinforcement learning. We propose the strategy-wise concentration principle which directly builds a confidence interval for the joint strategy, in contrast to the point-wise concentration principle that builds a confidence interval for each point in the joint action space. For two-player zero-sum Markov games, by exploiting the convexity of the strategy-wise bonus, we propose a computationally efficient algorithm whose sample complexity enjoys a better dependency on the number of actions than the prior methods based on the point-wise bonus. Furthermore, for offline multi-agent general-sum Markov games, based on the strategy-wise bonus and a novel surrogate function, we give the first algorithm whose sample complexity only scales Pm i=1 Ai where Ai is the action size of the i-th player and m is the number of players. In sharp contrast, the sample complexity of methods based on the point-wise bonus would scale with the size of the joint action space m i=1Ai due to the curse of multiagents. Lastly, all of our algorithms can naturally take a pre-specified strategy class as input and output a strategy that is close to the best strategy in . In this setting, the sample complexity only scales with log | | instead of Pm
Researcher Affiliation Academia Qiwen Cui Paul G. Allen School of Computer Science Engineering University of Washington qwcui@cs.washington.edu Simon S. Du Paul G. Allen School of Computer Science Engineering University of Washington ssdu@cs.washington.edu
Pseudocode Yes Algorithm 1 Strategy-wise Bonus + Maxi Min Optimization (SBMM)
Open Source Code No The paper is theoretical and does not mention providing open-source code for the described methodology. The ethical checklist explicitly states 'N/A' for questions related to code and experiments.
Open Datasets No The paper is theoretical and defines a 'compliant' dataset structure but does not use or provide access information for a specific, publicly available dataset. The ethical checklist states 'N/A' for experimental details.
Dataset Splits No The paper is theoretical and does not conduct experiments, therefore, it does not specify training/test/validation dataset splits. The ethical checklist states 'N/A' for training details including data splits.
Hardware Specification No The paper is theoretical and does not describe any experimental hardware specifications. The ethical checklist explicitly states 'N/A' for questions related to compute resources.
Software Dependencies No The paper is theoretical and does not describe any specific software dependencies with version numbers for running experiments. The ethical checklist explicitly states 'N/A' for questions related to experiments.
Experiment Setup No The paper is theoretical and does not include details about an experimental setup, such as hyperparameters or system-level training settings. The ethical checklist explicitly states 'N/A' for training details.