The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces
Authors: Chi Jin, Qinghua Liu, Tiancheng Yu
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose a new algorithm that can provably find the Nash equilibrium policy using a polynomial number of samples, for any MG with low multi-agent Bellman-Eluder dimension a new complexity measure adapted from its single-agent version (Jin et al., 2021). A key component of our new algorithm is the exploiter, which facilitates the learning of the main player by deliberately exploiting her weakness. Our theoretical framework is generic, which applies to a wide range of models including but not limited to tabular MGs, MGs with linear or kernel function approximation, and MGs with rich observations. |
| Researcher Affiliation | Academia | Chi Jin 1 Qinghua Liu 1 Tiancheng Yu 2 *Equal contribution 1Princeton University 2MIT. Correspondence to: Chi Jin <chij@princeton.edu>. |
| Pseudocode | Yes | Algorithm 1 GOLF WITH EXPLOITER (F, G, K, β) Algorithm 2 COMPUTE EXPLOITER(F, G, β, D, µ) |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm design and theoretical guarantees, not on empirical evaluation with datasets. It discusses 'samples' in the context of theoretical analysis of learning from interactions, not as a publicly available dataset. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical evaluation that would require specifying training/validation/test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any computational experiments requiring hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not describe an implementation with specific software dependencies or their versions. |
| Experiment Setup | No | The paper is theoretical and does not detail an experimental setup, hyperparameters, or training settings for empirical runs. |