Hessian Aided Policy Gradient

Authors: Zebang Shen, Alejandro Ribeiro, Hamed Hassani, Hui Qian, Chao Mi

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulations on standard tasks validate the efficiency of our method. We also provide extensive simulations on various reinforcement learning tasks to validate our analysis and illustrate the efficiency of our method. In this section, we evaluate the performance of the proposed HAPG method on several standard reinforcement learning tasks.
Researcher Affiliation Academia 1Zhejiang University 2University of Pennsylvania.
Pseudocode Yes Algorithm 1 Hessian Aided Policy Gradient (HAPG)
Open Source Code Yes More details of the parameter choices and the URL of our code are given in the appendix.
Open Datasets Yes The REINFORCE method from (Sutton et al., 2000) is used as the baseline for comparison. The performance of HAPG and REINFORCE are tested on six continuous RL tasks, namely Cart Pole, Swimmer, 2d Walker, Reacher, Humanoid, and Humanoid Standup, where the latter five are commonly used Mujoco environments (Todorov et al., 2012).
Dataset Splits No The paper mentions hyperparameters and optimization settings but does not explicitly provide specific percentages or counts for training, validation, or test dataset splits. While it mentions using standard Mujoco environments, it does not specify how data within these environments was partitioned for these purposes.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions 'the garage library (Duan et al., 2016)' but does not provide specific version numbers for this or any other software component used in the experiments.
Experiment Setup Yes In our experiments, we set the hyper-parameters such as the mini-batch size (|M0| and |M|), the epoch length p, and step-size according to our analysis. Specifically, (1) the mini-batch size of REINFORCE is set to the same value as |M0| in HAPG, which is obtained via grid-search, (2) |M| and p are set to satisfy |M| p = |M0|, and (3) the step-sizes of HAPG and REINFORCE are both set to be a small constant value 0.01. We use deep Gaussian policy with the mean and variance parameterized by a fully-connected neural network. The number of network layers and hidden units, and the nonlinear activation functions follows (Papini et al., 2018) with the details given in the appendix.