Stackelberg Actor-Critic: Game-Theoretic Reinforcement Learning Algorithms

Authors: Liyuan Zheng, Tanner Fiez, Zane Alumbaugh, Benjamin Chasnov, Lillian J. Ratliff9217-9224

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental From an empirical standpoint, we demonstrate via simple examples that the learning dynamics we study mitigate cycling and accelerate convergence compared to the usual gradient dynamics given cost structures induced by actor-critic formulations. Finally, experiments on Open AI gym environments show that Stackelberg actor-critic algorithms always perform at least as well and often significantly outperform the standard actor-critic algorithm counterparts.
Researcher Affiliation Academia 1University of Washington 2University of California, Santa Cruz {liyuanz8,fiezt,bchasnov,ratliffl}@uw.edu, zanedma@gmail.com
Pseudocode Yes Algorithm 1: Stackelberg Actor-Critic Framework
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the methodology described.
Open Datasets No We run experiments on the Open AI gym platform (Brockman et al. 2016) with the Mujoco Physics simulator (Todorov, Erez, and Tassa 2012). The paper mentions environments, not a specific dataset with access information.
Dataset Splits No The paper discusses experiments run on Open AI gym environments but does not specify any explicit training/test/validation dataset splits or cross-validation setup.
Hardware Specification No The paper states that experiments were run on the Open AI gym platform with Mujoco, but it does not provide specific hardware details such as CPU or GPU models used for these experiments.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the implementation.
Experiment Setup Yes We use a learning rate of 10−4 for the actor and 3 × 10−4 for the critic in all experiments. For DDPG and SAC, the actor and critic target networks are updated by Polyak averaging with update rate 0.005. The batch size is 256 for all algorithms. For STAC, STDDPG, and STSAC, the regularization parameter λ = 0.001. All networks are fully connected neural networks with two hidden layers of size 256 for both actor and critic networks.