reproducibilityindex.ai

Stackelberg Actor-Critic: Game-Theoretic Reinforcement Learning Algorithms

Authors: Liyuan Zheng, Tanner Fiez, Zane Alumbaugh, Benjamin Chasnov, Lillian J. Ratliff9217-9224

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	From an empirical standpoint, we demonstrate via simple examples that the learning dynamics we study mitigate cycling and accelerate convergence compared to the usual gradient dynamics given cost structures induced by actor-critic formulations. Finally, experiments on Open AI gym environments show that Stackelberg actor-critic algorithms always perform at least as well and often signiﬁcantly outperform the standard actor-critic algorithm counterparts.
Researcher Affiliation	Academia	1University of Washington 2University of California, Santa Cruz {liyuanz8,ﬁezt,bchasnov,ratlifﬂ}@uw.edu, zanedma@gmail.com
Pseudocode	Yes	Algorithm 1: Stackelberg Actor-Critic Framework
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the methodology described.
Open Datasets	No	We run experiments on the Open AI gym platform (Brockman et al. 2016) with the Mujoco Physics simulator (Todorov, Erez, and Tassa 2012). The paper mentions environments, not a specific dataset with access information.
Dataset Splits	No	The paper discusses experiments run on Open AI gym environments but does not specify any explicit training/test/validation dataset splits or cross-validation setup.
Hardware Specification	No	The paper states that experiments were run on the Open AI gym platform with Mujoco, but it does not provide specific hardware details such as CPU or GPU models used for these experiments.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies or libraries used in the implementation.
Experiment Setup	Yes	We use a learning rate of 10−4 for the actor and 3 × 10−4 for the critic in all experiments. For DDPG and SAC, the actor and critic target networks are updated by Polyak averaging with update rate 0.005. The batch size is 256 for all algorithms. For STAC, STDDPG, and STSAC, the regularization parameter λ = 0.001. All networks are fully connected neural networks with two hidden layers of size 256 for both actor and critic networks.