PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration

Authors: Yuda Song, Wen Sun

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally, we first demonstrate the flexibility and the efficacy of our algorithm on a set of exploration challenging control tasks where existing empirical model-based RL approaches completely fail. We then show that our approach retains excellent performance even in common dense reward control benchmarks that do not require heavy exploration.
Researcher Affiliation Academia 1Machine Learning Department, Carnegie Mellon University, Pittsburgh, USA 2Department of Computer Science, Cornell University, Ithaca , USA.
Pseudocode Yes Algorithm 1 The PC-MLP Framework Algorithm 2 Deep PC-MLP
Open Source Code No The paper does not include any explicit statement about releasing the source code for their proposed method, nor does it provide a link to a code repository.
Open Datasets Yes We test Deep PC-MLP in 10 Mujoco (Todorov et al., 2012) locomotion and navigation environments.
Dataset Splits No The paper mentions training with "200k real-world samples" and using "4 random seeds", but it does not specify any dataset splits for training, validation, or testing, nor does it describe a cross-validation setup.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments, such as CPU or GPU models, memory, or cloud instance specifications.
Software Dependencies No The paper mentions software components like Open AI Gym, Mujoco, TRPO, and MPPI, but it does not provide specific version numbers for any of these or other software dependencies.
Experiment Setup Yes We include all experiments and hyperparameter details in Appendix D.