reproducibilityindex.ai

The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces

Authors: Chi Jin, Qinghua Liu, Tiancheng Yu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We propose a new algorithm that can provably find the Nash equilibrium policy using a polynomial number of samples, for any MG with low multi-agent Bellman-Eluder dimension a new complexity measure adapted from its single-agent version (Jin et al., 2021). A key component of our new algorithm is the exploiter, which facilitates the learning of the main player by deliberately exploiting her weakness. Our theoretical framework is generic, which applies to a wide range of models including but not limited to tabular MGs, MGs with linear or kernel function approximation, and MGs with rich observations.
Researcher Affiliation	Academia	Chi Jin 1 Qinghua Liu 1 Tiancheng Yu 2 *Equal contribution 1Princeton University 2MIT. Correspondence to: Chi Jin <chij@princeton.edu>.
Pseudocode	Yes	Algorithm 1 GOLF WITH EXPLOITER (F, G, K, β) Algorithm 2 COMPUTE EXPLOITER(F, G, β, D, µ)
Open Source Code	No	The paper does not provide any explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets	No	The paper is theoretical and focuses on algorithm design and theoretical guarantees, not on empirical evaluation with datasets. It discusses 'samples' in the context of theoretical analysis of learning from interactions, not as a publicly available dataset.
Dataset Splits	No	The paper is theoretical and does not involve empirical evaluation that would require specifying training/validation/test dataset splits.
Hardware Specification	No	The paper is theoretical and does not describe any computational experiments requiring hardware specifications.
Software Dependencies	No	The paper is theoretical and does not describe an implementation with specific software dependencies or their versions.
Experiment Setup	No	The paper is theoretical and does not detail an experimental setup, hyperparameters, or training settings for empirical runs.