Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning
Authors: Matthias Gerstgrasser, David C. Parkes
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose one such approach using contextual policies, and evaluate it experimentally on both standard and novel benchmark domains, showing greatly improved sample efficiency compared to previous approaches. |
| Researcher Affiliation | Collaboration | 1John A. Paulson School of Engineering and Applied Science, Harvard University, Cambridge, MA, USA 2Computer Science Department, Stanford University, Stanford, CA, USA 3Deepmind, London, UK. |
| Pseudocode | Yes | Algorithm 1 in the Appendix details this in pseudo-code. |
| Open Source Code | No | The paper does not provide a statement or link indicating that its source code is publicly available. |
| Open Datasets | Yes | We evaluate our Meta-RL approach on both a benchmark iterated matrix game domain... as well as on a novel Atari 2600-based domain... We modify the Atari 2600 game Space Invaders. |
| Dataset Splits | No | The paper describes the experimental setup, but does not provide specific details on training, validation, and test dataset splits (e.g., percentages or sample counts for each split). |
| Hardware Specification | Yes | Experiments were run on recent Intel Xeon processors with a single core and 2GB RAM per experiment. |
| Software Dependencies | Yes | All experiments were implemented using Ray / RLlib 2.0.0 (Liang et al., 2018)... and using Torch. Any hyperparameters not listed were left at default values in rllib version 2.0.0. |
| Experiment Setup | Yes | We further use a two-stage training approach. In Phase 1, we train a follower meta-policy... In Phase 2, we train a leader policy... We use n = 10 steps per episode. ...we use discrete prices (0, 0.25, 0.5, 0.75.1.0) for compatibility with the discrete Atari environment. Algorithm 1 details the two-phase learning algorithm we use. ... Table H lists the hyperparameters used for each of these algorithms. |