Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion

Authors: Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee

NeurIPS 2018 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach outperforms model-free baselines on challenging continuous control benchmarks with an order-of-magnitude increase in sample efficiency, and in contrast to previous model-based approaches, performance does not degrade in complex environments.We systematically evaluate STEVE on several challenging continuous control benchmarks and demonstrate that STEVE significantly outperforms model-free baselines with an order-of-magnitude increase in sample efficiency.
Researcher Affiliation Industry Google Brain, Mountain View, CA, USA EMAIL, EMAIL, EMAIL
Pseudocode No The paper includes a section titled 'Algorithm' but it describes the algorithm in text rather than providing a structured pseudocode block.
Open Source Code Yes Our code is available open-source at: https://github.com/tensorflow/models/tree/master/ research/steve
Open Datasets Yes We evaluated STEVE on a variety of continuous control tasks [3, 19]; we plot learning curves in Figure 3. ... [3] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. Openai gym. ar Xiv preprint ar Xiv:1606.01540, 2016. [19] O. Klimov and J. Schulman. Roboschool. https://github.com/openai/roboschool.
Dataset Splits No The paper does not provide specific train/validation/test dataset split percentages or counts. Reinforcement learning environments are used where data is collected through interaction.
Hardware Specification Yes Both were trained on a P100 GPU and had 8 CPUs collecting data; STEVE-DPPG additionally used a second P100 to learn a model in parallel.
Software Dependencies No The paper states 'All algorithms were implemented in Tensorflow [1]' but does not provide a specific version number for TensorFlow or any other software dependencies.
Experiment Setup No For hyperparameters and additional implementation details, please see Appendix C.