Model-augmented Prioritized Experience Replay
Authors: Youngmin Oh, Jinwoo Shin, Eunho Yang, Sung Ju Hwang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Indeed, our experimental results on various tasks demonstrate that Ma PER can significantly improve the performance of the state-of-the-art offpolicy Mf RL and Mb RL which includes off-policy Mf RL algorithms in its policy optimization procedure. |
| Researcher Affiliation | Collaboration | Youngmin Oh1, Jinwoo Shin2, Eunho Yang2,3, Sung Ju Hwang2,3 1 Samsung Advanced Institute of Technology 2 Korea Advanced Institute of Science and Technology 3 AITRICS |
| Pseudocode | Yes | Algorithm 1 Model-augmented Prioritized Experience Replay based on Actor-Critic Methods |
| Open Source Code | Yes | We describe the implementation details for experiments in Appendix B. We also provide our code. |
| Open Datasets | Yes | For continuous control tasks, we consider not only Mu Jo Co environments (Todorov et al., 2012), which have been frequently used to validate Many RL algorithms, but also Py Bullet Gymperium1, which are free implementations of the original Mu Jo Co environments. Other free environments in the Open AI (Brockman et al., 2016) are also considered. For discrete control tasks, we validate our method on Atari games. |
| Dataset Splits | No | The paper mentions evaluations based on multiple random runs and reports mean and standard deviations, but does not specify explicit training/validation/test dataset splits (e.g., percentages or counts) for reproducibility purposes in the context of fixed datasets. In Reinforcement Learning, data is often generated during training, and evaluation is done by running policies in the environment, rather than on a static validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions optimizers like Adam and various RL algorithms (SAC, TD3, Rainbow), but does not specify particular software packages or libraries with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Table B.1 includes the parameters that are used for the experiments under Model-free Reinforcement Learning (Mf RL) in this paper. Table B.2 shows parameters for applying MBPO (Janner et al., 2019), which is one of the state-of-the-art model-based RLs. We basically employed parameters introduced in (Janner et al., 2019) for MBPO experiments. |