reproducibilityindex.ai

Momentum-Based Policy Gradient Methods

Authors: Feihu Huang, Shangqian Gao, Jian Pei, Heng Huang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the experiments, we apply four benchmark tasks to demonstrate the effectiveness of our algorithms.
Researcher Affiliation	Collaboration	1Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, USA 2School of Computing Science, Simon Fraser University, Vancouver, Canada 3JD Finance America Corporation, Mountain View, CA, USA.
Pseudocode	Yes	Algorithm 1 Important-Sampling Momentum-Based Policy Gradient (IS-MBPG) Algorithm
Open Source Code	Yes	Our code is publicly available on https://github.com/gaosh/MBPG.
Open Datasets	Yes	In this section, we demonstrate the performance of our algorithms on four standard reinforcement learning tasks, which are Cart Pole, Walker, Half Cheetah and Hopper. ... Previous works mostly use environments implemented by old versions of garage, while latest version of garage directly use environments from gym (Brockman et al., 2016).
Dataset Splits	No	The paper uses standard reinforcement learning environments but does not explicitly provide training, validation, and test dataset splits or methodologies.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments (e.g., GPU/CPU models, memory).
Software Dependencies	No	We implement our algorithms by using garage (garage contributors, 2019) and pytorch (Paszke et al., 2019).
Experiment Setup	Yes	In the experiment, we use Categorical Policy for Cart Pole, and Gaussian Policy for all the other environments. All Policies are parameterized by the fully connected neural network. The detail of network architecture and activation function used are shown in the Appendix A. ... We use the same batch size \|B\| for all algorithms, though our algorithms do not have a requirement on it. HAPG and SRVR-PG have sub-iterations (or inner loop), and requires additional hyper-parameters. The inner batch size for HAPG and SRVR-PG is also set to be the same value. ... The more details of hyper-parameter selection are shown in Appendix A.