Generalized Off-Policy Actor-Critic

Authors: Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the merits of Geoff-PAC over existing algorithms in Mujoco robot simulation tasks, the first empirical success of emphatic algorithms in prevailing deep RL benchmarks.
Researcher Affiliation Academia Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson Department of Computer Science University of Oxford {shangtong.zhang, wendelin.boehmer, shimon.whiteson}@cs.ox.ac.uk
Pseudocode No Pseudocode of Geoff-PAC is provided in supplementary materials.
Open Source Code Yes More details are provided in supplementary materials and all the implementations are publicly available5. 5https://github.com/Shangtong Zhang/Deep RL
Open Datasets Yes We benchmarked Off-PAC, ACE, DDPG, TD3, and Geoff-PAC on five Mujoco robot simulation tasks from Open AI gym (Brockman et al., 2016).
Dataset Splits No The paper describes using Mujoco robot simulation tasks, which are environments for reinforcement learning, not static datasets with explicit train/validation/test splits in the traditional supervised learning sense. No specific percentages or sample counts for data splits are provided.
Hardware Specification No The paper mentions 'a generous equipment grant from NVIDIA' but does not specify any particular GPU models, CPU models, or other detailed hardware specifications used for running the experiments.
Software Dependencies No The paper does not list specific version numbers for software dependencies or libraries used in the implementation or experiments. It only mentions that 'all the implementations are publicly available'.
Experiment Setup Yes To stabilize training, we adopted the A2C (Clemente et al., 2017) paradigm with multiple workers and utilized a target network (Mnih et al., 2015) and a replay buffer (Lin, 1992). All three algorithms share the same architecture and the same parameterization. We found ACE was not sensitive to λ1 and set λ1 = 0 for all experiments. For Geoff-PAC, we found λ1 = 0.7, λ2 = 0.6, ˆγ = 0.2 produced good empirical results and used this combination for all remaining tasks.