reproducibilityindex.ai

Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning

Authors: Matthias Gerstgrasser, David C. Parkes

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose one such approach using contextual policies, and evaluate it experimentally on both standard and novel benchmark domains, showing greatly improved sample efficiency compared to previous approaches.
Researcher Affiliation	Collaboration	1John A. Paulson School of Engineering and Applied Science, Harvard University, Cambridge, MA, USA 2Computer Science Department, Stanford University, Stanford, CA, USA 3Deepmind, London, UK.
Pseudocode	Yes	Algorithm 1 in the Appendix details this in pseudo-code.
Open Source Code	No	The paper does not provide a statement or link indicating that its source code is publicly available.
Open Datasets	Yes	We evaluate our Meta-RL approach on both a benchmark iterated matrix game domain... as well as on a novel Atari 2600-based domain... We modify the Atari 2600 game Space Invaders.
Dataset Splits	No	The paper describes the experimental setup, but does not provide specific details on training, validation, and test dataset splits (e.g., percentages or sample counts for each split).
Hardware Specification	Yes	Experiments were run on recent Intel Xeon processors with a single core and 2GB RAM per experiment.
Software Dependencies	Yes	All experiments were implemented using Ray / RLlib 2.0.0 (Liang et al., 2018)... and using Torch. Any hyperparameters not listed were left at default values in rllib version 2.0.0.
Experiment Setup	Yes	We further use a two-stage training approach. In Phase 1, we train a follower meta-policy... In Phase 2, we train a leader policy... We use n = 10 steps per episode. ...we use discrete prices (0, 0.25, 0.5, 0.75.1.0) for compatibility with the discrete Atari environment. Algorithm 1 details the two-phase learning algorithm we use. ... Table H lists the hyperparameters used for each of these algorithms.