reproducibilityindex.ai

Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus

Authors: Qiwen Cui, Simon S. Du

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper considers ofﬂine multi-agent reinforcement learning. We propose the strategy-wise concentration principle which directly builds a conﬁdence interval for the joint strategy, in contrast to the point-wise concentration principle that builds a conﬁdence interval for each point in the joint action space. For two-player zero-sum Markov games, by exploiting the convexity of the strategy-wise bonus, we propose a computationally efﬁcient algorithm whose sample complexity enjoys a better dependency on the number of actions than the prior methods based on the point-wise bonus. Furthermore, for ofﬂine multi-agent general-sum Markov games, based on the strategy-wise bonus and a novel surrogate function, we give the ﬁrst algorithm whose sample complexity only scales Pm i=1 Ai where Ai is the action size of the i-th player and m is the number of players. In sharp contrast, the sample complexity of methods based on the point-wise bonus would scale with the size of the joint action space m i=1Ai due to the curse of multiagents. Lastly, all of our algorithms can naturally take a pre-speciﬁed strategy class as input and output a strategy that is close to the best strategy in . In this setting, the sample complexity only scales with log \| \| instead of Pm
Researcher Affiliation	Academia	Qiwen Cui Paul G. Allen School of Computer Science Engineering University of Washington qwcui@cs.washington.edu Simon S. Du Paul G. Allen School of Computer Science Engineering University of Washington ssdu@cs.washington.edu
Pseudocode	Yes	Algorithm 1 Strategy-wise Bonus + Maxi Min Optimization (SBMM)
Open Source Code	No	The paper is theoretical and does not mention providing open-source code for the described methodology. The ethical checklist explicitly states 'N/A' for questions related to code and experiments.
Open Datasets	No	The paper is theoretical and defines a 'compliant' dataset structure but does not use or provide access information for a specific, publicly available dataset. The ethical checklist states 'N/A' for experimental details.
Dataset Splits	No	The paper is theoretical and does not conduct experiments, therefore, it does not specify training/test/validation dataset splits. The ethical checklist states 'N/A' for training details including data splits.
Hardware Specification	No	The paper is theoretical and does not describe any experimental hardware specifications. The ethical checklist explicitly states 'N/A' for questions related to compute resources.
Software Dependencies	No	The paper is theoretical and does not describe any specific software dependencies with version numbers for running experiments. The ethical checklist explicitly states 'N/A' for questions related to experiments.
Experiment Setup	No	The paper is theoretical and does not include details about an experimental setup, such as hyperparameters or system-level training settings. The ethical checklist explicitly states 'N/A' for training details.