reproducibilityindex.ai

Generalized Off-Policy Actor-Critic

Authors: Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the merits of Geoff-PAC over existing algorithms in Mujoco robot simulation tasks, the ﬁrst empirical success of emphatic algorithms in prevailing deep RL benchmarks.
Researcher Affiliation	Academia	Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson Department of Computer Science University of Oxford {shangtong.zhang, wendelin.boehmer, shimon.whiteson}@cs.ox.ac.uk
Pseudocode	No	Pseudocode of Geoff-PAC is provided in supplementary materials.
Open Source Code	Yes	More details are provided in supplementary materials and all the implementations are publicly available5. 5https://github.com/Shangtong Zhang/Deep RL
Open Datasets	Yes	We benchmarked Off-PAC, ACE, DDPG, TD3, and Geoff-PAC on ﬁve Mujoco robot simulation tasks from Open AI gym (Brockman et al., 2016).
Dataset Splits	No	The paper describes using Mujoco robot simulation tasks, which are environments for reinforcement learning, not static datasets with explicit train/validation/test splits in the traditional supervised learning sense. No specific percentages or sample counts for data splits are provided.
Hardware Specification	No	The paper mentions 'a generous equipment grant from NVIDIA' but does not specify any particular GPU models, CPU models, or other detailed hardware specifications used for running the experiments.
Software Dependencies	No	The paper does not list specific version numbers for software dependencies or libraries used in the implementation or experiments. It only mentions that 'all the implementations are publicly available'.
Experiment Setup	Yes	To stabilize training, we adopted the A2C (Clemente et al., 2017) paradigm with multiple workers and utilized a target network (Mnih et al., 2015) and a replay buffer (Lin, 1992). All three algorithms share the same architecture and the same parameterization. We found ACE was not sensitive to λ1 and set λ1 = 0 for all experiments. For Geoff-PAC, we found λ1 = 0.7, λ2 = 0.6, ˆγ = 0.2 produced good empirical results and used this combination for all remaining tasks.