reproducibilityindex.ai

Hessian Aided Policy Gradient

Authors: Zebang Shen, Alejandro Ribeiro, Hamed Hassani, Hui Qian, Chao Mi

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulations on standard tasks validate the efﬁciency of our method. We also provide extensive simulations on various reinforcement learning tasks to validate our analysis and illustrate the efﬁciency of our method. In this section, we evaluate the performance of the proposed HAPG method on several standard reinforcement learning tasks.
Researcher Affiliation	Academia	1Zhejiang University 2University of Pennsylvania.
Pseudocode	Yes	Algorithm 1 Hessian Aided Policy Gradient (HAPG)
Open Source Code	Yes	More details of the parameter choices and the URL of our code are given in the appendix.
Open Datasets	Yes	The REINFORCE method from (Sutton et al., 2000) is used as the baseline for comparison. The performance of HAPG and REINFORCE are tested on six continuous RL tasks, namely Cart Pole, Swimmer, 2d Walker, Reacher, Humanoid, and Humanoid Standup, where the latter ﬁve are commonly used Mujoco environments (Todorov et al., 2012).
Dataset Splits	No	The paper mentions hyperparameters and optimization settings but does not explicitly provide specific percentages or counts for training, validation, or test dataset splits. While it mentions using standard Mujoco environments, it does not specify how data within these environments was partitioned for these purposes.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions 'the garage library (Duan et al., 2016)' but does not provide specific version numbers for this or any other software component used in the experiments.
Experiment Setup	Yes	In our experiments, we set the hyper-parameters such as the mini-batch size (\|M0\| and \|M\|), the epoch length p, and step-size according to our analysis. Speciﬁcally, (1) the mini-batch size of REINFORCE is set to the same value as \|M0\| in HAPG, which is obtained via grid-search, (2) \|M\| and p are set to satisfy \|M\| p = \|M0\|, and (3) the step-sizes of HAPG and REINFORCE are both set to be a small constant value 0.01. We use deep Gaussian policy with the mean and variance parameterized by a fully-connected neural network. The number of network layers and hidden units, and the nonlinear activation functions follows (Papini et al., 2018) with the details given in the appendix.