Momentum-Based Policy Gradient Methods
Authors: Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experiments, we apply four benchmark tasks to demonstrate the effectiveness of our algorithms. |
| Researcher Affiliation | Collaboration | 1Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, USA 2School of Computing Science, Simon Fraser University, Vancouver, Canada 3JD Finance America Corporation, Mountain View, CA, USA. |
| Pseudocode | Yes | Algorithm 1 Important-Sampling Momentum-Based Policy Gradient (IS-MBPG) Algorithm |
| Open Source Code | Yes | Our code is publicly available on https://github.com/gaosh/MBPG. |
| Open Datasets | Yes | In this section, we demonstrate the performance of our algorithms on four standard reinforcement learning tasks, which are Cart Pole, Walker, Half Cheetah and Hopper. ... Previous works mostly use environments implemented by old versions of garage, while latest version of garage directly use environments from gym (Brockman et al., 2016). |
| Dataset Splits | No | The paper uses standard reinforcement learning environments but does not explicitly provide training, validation, and test dataset splits or methodologies. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | We implement our algorithms by using garage (garage contributors, 2019) and pytorch (Paszke et al., 2019). |
| Experiment Setup | Yes | In the experiment, we use Categorical Policy for Cart Pole, and Gaussian Policy for all the other environments. All Policies are parameterized by the fully connected neural network. The detail of network architecture and activation function used are shown in the Appendix A. ... We use the same batch size |B| for all algorithms, though our algorithms do not have a requirement on it. HAPG and SRVR-PG have sub-iterations (or inner loop), and requires additional hyper-parameters. The inner batch size for HAPG and SRVR-PG is also set to be the same value. ... The more details of hyper-parameter selection are shown in Appendix A. |