Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion

Authors: Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach outperforms model-free baselines on challenging continuous control benchmarks with an order-of-magnitude increase in sample efficiency, and in contrast to previous model-based approaches, performance does not degrade in complex environments.We systematically evaluate STEVE on several challenging continuous control benchmarks and demonstrate that STEVE significantly outperforms model-free baselines with an order-of-magnitude increase in sample efficiency.
Researcher Affiliation Industry Google Brain, Mountain View, CA, USA jacobbuckman@gmail.com, mail@danijar.com, {gjt,ebrevdo,honglak}@google.com
Pseudocode No The paper includes a section titled 'Algorithm' but it describes the algorithm in text rather than providing a structured pseudocode block.
Open Source Code Yes Our code is available open-source at: https://github.com/tensorflow/models/tree/master/ research/steve
Open Datasets Yes We evaluated STEVE on a variety of continuous control tasks [3, 19]; we plot learning curves in Figure 3. ... [3] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. Openai gym. ar Xiv preprint ar Xiv:1606.01540, 2016. [19] O. Klimov and J. Schulman. Roboschool. https://github.com/openai/roboschool.
Dataset Splits No The paper does not provide specific train/validation/test dataset split percentages or counts. Reinforcement learning environments are used where data is collected through interaction.
Hardware Specification Yes Both were trained on a P100 GPU and had 8 CPUs collecting data; STEVE-DPPG additionally used a second P100 to learn a model in parallel.
Software Dependencies No The paper states 'All algorithms were implemented in Tensorflow [1]' but does not provide a specific version number for TensorFlow or any other software dependencies.
Experiment Setup No For hyperparameters and additional implementation details, please see Appendix C.