reproducibilityindex.ai

Episodic Policy Gradient Training

Authors: Hung Le, Majid Abdolshah, Thommen K. George, Kien Do, Dung Nguyen, Svetha Venkatesh7317-7325

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on both continuous and discrete environments demonstrate the advantage of using the proposed method in boosting the performance of various policy gradient algorithms.
Researcher Affiliation	Academia	Applied AI Institute, Deakin University, Geelong, Australia {thai.le, m.abdolshah, thommen.karimpanalgeorge, k.do, dung.nguyen, svetha.venkatesh}@deakin.edu.au
Pseudocode	Yes	Algorithm 1: Episodic Policy Gradient Training.
Open Source Code	Yes	Our code can be found at https://github.com/thaihungle/EPGT
Open Datasets	Yes	We test on 2 environments: Mountain Car Continuous (MCC) and Bipedal Walker (BW)... We conduct experiments on 6 Mujoco environments: Half Cheetah, Hooper, Walker2d, Swimmer, Ant and Humanoid... We adopt 6 standard Atari games...
Dataset Splits	No	The paper describes training durations (e.g., 'train agents for 5 million steps') and evaluation metrics, but does not explicitly provide information about static training, validation, or test dataset splits, which is common in reinforcement learning where data is generated through interaction with an environment rather than from a fixed dataset.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory specifications, or types of computing infrastructure used for running the experiments.
Software Dependencies	No	The paper mentions the use of different policy gradient methods (A2C, ACKTR, PPO) and implies programming in Python, but it does not specify exact version numbers for any software dependencies or libraries.
Experiment Setup	Yes	The experimental details can be found in the Appendix B. We test on 2 environments: Mountain Car Continuous (MCC) and Bipedal Walker (BW) with long and short learning rate search ranges ([4 10 5, 10 2] and [2.8 10 4, 1.8 10 3], respectively). We train agents for 5 million steps and report the mean (and std. if applicable) over 10 runs.