Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning
Authors: Matthias Gerstgrasser, David C. Parkes
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose one such approach using contextual policies, and evaluate it experimentally on both standard and novel benchmark domains, showing greatly improved sample efficiency compared to previous approaches. |
| Researcher Affiliation | Collaboration | 1John A. Paulson School of Engineering and Applied Science, Harvard University, Cambridge, MA, USA 2Computer Science Department, Stanford University, Stanford, CA, USA 3Deepmind, London, UK. |
| Pseudocode | Yes | Algorithm 1 in the Appendix details this in pseudo-code. |
| Open Source Code | No | The paper does not provide a statement or link indicating that its source code is publicly available. |
| Open Datasets | Yes | We evaluate our Meta-RL approach on both a benchmark iterated matrix game domain... as well as on a novel Atari 2600-based domain... We modify the Atari 2600 game Space Invaders. |
| Dataset Splits | No | The paper describes the experimental setup, but does not provide specific details on training, validation, and test dataset splits (e.g., percentages or sample counts for each split). |
| Hardware Specification | Yes | Experiments were run on recent Intel Xeon processors with a single core and 2GB RAM per experiment. |
| Software Dependencies | Yes | All experiments were implemented using Ray / RLlib 2.0.0 (Liang et al., 2018)... and using Torch. Any hyperparameters not listed were left at default values in rllib version 2.0.0. |
| Experiment Setup | Yes | We further use a two-stage training approach. In Phase 1, we train a follower meta-policy... In Phase 2, we train a leader policy... We use n = 10 steps per episode. ...we use discrete prices (0, 0.25, 0.5, 0.75.1.0) for compatibility with the discrete Atari environment. Algorithm 1 details the two-phase learning algorithm we use. ... Table H lists the hyperparameters used for each of these algorithms. |