The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces

Authors: Chi Jin, Qinghua Liu, Tiancheng Yu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We propose a new algorithm that can provably find the Nash equilibrium policy using a polynomial number of samples, for any MG with low multi-agent Bellman-Eluder dimension a new complexity measure adapted from its single-agent version (Jin et al., 2021). A key component of our new algorithm is the exploiter, which facilitates the learning of the main player by deliberately exploiting her weakness. Our theoretical framework is generic, which applies to a wide range of models including but not limited to tabular MGs, MGs with linear or kernel function approximation, and MGs with rich observations.
Researcher Affiliation Academia Chi Jin 1 Qinghua Liu 1 Tiancheng Yu 2 *Equal contribution 1Princeton University 2MIT. Correspondence to: Chi Jin <chij@princeton.edu>.
Pseudocode Yes Algorithm 1 GOLF WITH EXPLOITER (F, G, K, β) Algorithm 2 COMPUTE EXPLOITER(F, G, β, D, µ)
Open Source Code No The paper does not provide any explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets No The paper is theoretical and focuses on algorithm design and theoretical guarantees, not on empirical evaluation with datasets. It discusses 'samples' in the context of theoretical analysis of learning from interactions, not as a publicly available dataset.
Dataset Splits No The paper is theoretical and does not involve empirical evaluation that would require specifying training/validation/test dataset splits.
Hardware Specification No The paper is theoretical and does not describe any computational experiments requiring hardware specifications.
Software Dependencies No The paper is theoretical and does not describe an implementation with specific software dependencies or their versions.
Experiment Setup No The paper is theoretical and does not detail an experimental setup, hyperparameters, or training settings for empirical runs.