Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning

Authors: Matthias Gerstgrasser, David C. Parkes

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose one such approach using contextual policies, and evaluate it experimentally on both standard and novel benchmark domains, showing greatly improved sample efficiency compared to previous approaches.
Researcher Affiliation Collaboration 1John A. Paulson School of Engineering and Applied Science, Harvard University, Cambridge, MA, USA 2Computer Science Department, Stanford University, Stanford, CA, USA 3Deepmind, London, UK.
Pseudocode Yes Algorithm 1 in the Appendix details this in pseudo-code.
Open Source Code No The paper does not provide a statement or link indicating that its source code is publicly available.
Open Datasets Yes We evaluate our Meta-RL approach on both a benchmark iterated matrix game domain... as well as on a novel Atari 2600-based domain... We modify the Atari 2600 game Space Invaders.
Dataset Splits No The paper describes the experimental setup, but does not provide specific details on training, validation, and test dataset splits (e.g., percentages or sample counts for each split).
Hardware Specification Yes Experiments were run on recent Intel Xeon processors with a single core and 2GB RAM per experiment.
Software Dependencies Yes All experiments were implemented using Ray / RLlib 2.0.0 (Liang et al., 2018)... and using Torch. Any hyperparameters not listed were left at default values in rllib version 2.0.0.
Experiment Setup Yes We further use a two-stage training approach. In Phase 1, we train a follower meta-policy... In Phase 2, we train a leader policy... We use n = 10 steps per episode. ...we use discrete prices (0, 0.25, 0.5, 0.75.1.0) for compatibility with the discrete Atari environment. Algorithm 1 details the two-phase learning algorithm we use. ... Table H lists the hyperparameters used for each of these algorithms.