PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration
Authors: Yuda Song, Wen Sun
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, we first demonstrate the flexibility and the efficacy of our algorithm on a set of exploration challenging control tasks where existing empirical model-based RL approaches completely fail. We then show that our approach retains excellent performance even in common dense reward control benchmarks that do not require heavy exploration. |
| Researcher Affiliation | Academia | 1Machine Learning Department, Carnegie Mellon University, Pittsburgh, USA 2Department of Computer Science, Cornell University, Ithaca , USA. |
| Pseudocode | Yes | Algorithm 1 The PC-MLP Framework Algorithm 2 Deep PC-MLP |
| Open Source Code | No | The paper does not include any explicit statement about releasing the source code for their proposed method, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We test Deep PC-MLP in 10 Mujoco (Todorov et al., 2012) locomotion and navigation environments. |
| Dataset Splits | No | The paper mentions training with "200k real-world samples" and using "4 random seeds", but it does not specify any dataset splits for training, validation, or testing, nor does it describe a cross-validation setup. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as CPU or GPU models, memory, or cloud instance specifications. |
| Software Dependencies | No | The paper mentions software components like Open AI Gym, Mujoco, TRPO, and MPPI, but it does not provide specific version numbers for any of these or other software dependencies. |
| Experiment Setup | Yes | We include all experiments and hyperparameter details in Appendix D. |