reproducibilityindex.ai

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Authors: Sebastian Curi, Felix Berkenkamp, Andreas Krause

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that optimistic exploration signiﬁcantly speeds-up learning when there are penalties on actions, a setting that is notoriously difﬁcult for existing model-based reinforcement learning algorithms.
Researcher Affiliation	Collaboration	Sebastian Curi Department of Computer Science ETH Zurich scuri@inf.ethz.ch Felix Berkenkamp Bosch Center for Artiﬁcial Intelligence felix.berkenkamp@de.bosch.com Andreas Krause Department of Computer Science ETH Zurich krausea@ethz.ch
Pseudocode	Yes	Algorithm 1 Model-based Reinforcement Learning; Algorithm 2 H-UCRL combining Optimistic Policy Search and Planning
Open Source Code	Yes	We provide an open-source implementation of our method, which is available at http://github.com/sebascuri/hucrl.
Open Datasets	No	The paper uses Mujoco environments for experiments but does not provide access information (link, DOI, formal citation) to any pre-existing public datasets used for training, as the data is generated during experimental rollouts.
Dataset Splits	No	The paper describes an episodic learning setting where data is collected and used. It does not specify explicit train/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions 'PyTorch' implicitly through the GitHub link to 'rl-lib a pytorch-based library' but does not specify exact version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	Throughout the experiments, we consider reward functions of the form r(s, a) = rstate(s) ρcaction(a), where rstate(s) is the reward for being in a good state, and ρ [0, ) is a parameter that scales the action costs caction(a). ... As modeling choice, we use 5-head probabilistic ensembles as in Chua et al. (2018). ... For more experimental details and learning curves, see Appendix B.