Episodic Policy Gradient Training
Authors: Hung Le, Majid Abdolshah, Thommen K. George, Kien Do, Dung Nguyen, Svetha Venkatesh7317-7325
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on both continuous and discrete environments demonstrate the advantage of using the proposed method in boosting the performance of various policy gradient algorithms. |
| Researcher Affiliation | Academia | Applied AI Institute, Deakin University, Geelong, Australia {thai.le, m.abdolshah, thommen.karimpanalgeorge, k.do, dung.nguyen, svetha.venkatesh}@deakin.edu.au |
| Pseudocode | Yes | Algorithm 1: Episodic Policy Gradient Training. |
| Open Source Code | Yes | Our code can be found at https://github.com/thaihungle/EPGT |
| Open Datasets | Yes | We test on 2 environments: Mountain Car Continuous (MCC) and Bipedal Walker (BW)... We conduct experiments on 6 Mujoco environments: Half Cheetah, Hooper, Walker2d, Swimmer, Ant and Humanoid... We adopt 6 standard Atari games... |
| Dataset Splits | No | The paper describes training durations (e.g., 'train agents for 5 million steps') and evaluation metrics, but does not explicitly provide information about static training, validation, or test dataset splits, which is common in reinforcement learning where data is generated through interaction with an environment rather than from a fixed dataset. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory specifications, or types of computing infrastructure used for running the experiments. |
| Software Dependencies | No | The paper mentions the use of different policy gradient methods (A2C, ACKTR, PPO) and implies programming in Python, but it does not specify exact version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | The experimental details can be found in the Appendix B. We test on 2 environments: Mountain Car Continuous (MCC) and Bipedal Walker (BW) with long and short learning rate search ranges ([4 10 5, 10 2] and [2.8 10 4, 1.8 10 3], respectively). We train agents for 5 million steps and report the mean (and std. if applicable) over 10 runs. |