Model-augmented Prioritized Experience Replay

Authors: Youngmin Oh, Jinwoo Shin, Eunho Yang, Sung Ju Hwang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Indeed, our experimental results on various tasks demonstrate that Ma PER can significantly improve the performance of the state-of-the-art offpolicy Mf RL and Mb RL which includes off-policy Mf RL algorithms in its policy optimization procedure.
Researcher Affiliation Collaboration Youngmin Oh1, Jinwoo Shin2, Eunho Yang2,3, Sung Ju Hwang2,3 1 Samsung Advanced Institute of Technology 2 Korea Advanced Institute of Science and Technology 3 AITRICS
Pseudocode Yes Algorithm 1 Model-augmented Prioritized Experience Replay based on Actor-Critic Methods
Open Source Code Yes We describe the implementation details for experiments in Appendix B. We also provide our code.
Open Datasets Yes For continuous control tasks, we consider not only Mu Jo Co environments (Todorov et al., 2012), which have been frequently used to validate Many RL algorithms, but also Py Bullet Gymperium1, which are free implementations of the original Mu Jo Co environments. Other free environments in the Open AI (Brockman et al., 2016) are also considered. For discrete control tasks, we validate our method on Atari games.
Dataset Splits No The paper mentions evaluations based on multiple random runs and reports mean and standard deviations, but does not specify explicit training/validation/test dataset splits (e.g., percentages or counts) for reproducibility purposes in the context of fixed datasets. In Reinforcement Learning, data is often generated during training, and evaluation is done by running policies in the environment, rather than on a static validation set.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies No The paper mentions optimizers like Adam and various RL algorithms (SAC, TD3, Rainbow), but does not specify particular software packages or libraries with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Table B.1 includes the parameters that are used for the experiments under Model-free Reinforcement Learning (Mf RL) in this paper. Table B.2 shows parameters for applying MBPO (Janner et al., 2019), which is one of the state-of-the-art model-based RLs. We basically employed parameters introduced in (Janner et al., 2019) for MBPO experiments.