reproducibilityindex.ai

Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion

Authors: Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our approach outperforms model-free baselines on challenging continuous control benchmarks with an order-of-magnitude increase in sample efﬁciency, and in contrast to previous model-based approaches, performance does not degrade in complex environments.We systematically evaluate STEVE on several challenging continuous control benchmarks and demonstrate that STEVE signiﬁcantly outperforms model-free baselines with an order-of-magnitude increase in sample efﬁciency.
Researcher Affiliation	Industry	Google Brain, Mountain View, CA, USA jacobbuckman@gmail.com, mail@danijar.com, {gjt,ebrevdo,honglak}@google.com
Pseudocode	No	The paper includes a section titled 'Algorithm' but it describes the algorithm in text rather than providing a structured pseudocode block.
Open Source Code	Yes	Our code is available open-source at: https://github.com/tensorflow/models/tree/master/ research/steve
Open Datasets	Yes	We evaluated STEVE on a variety of continuous control tasks [3, 19]; we plot learning curves in Figure 3. ... [3] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. Openai gym. ar Xiv preprint ar Xiv:1606.01540, 2016. [19] O. Klimov and J. Schulman. Roboschool. https://github.com/openai/roboschool.
Dataset Splits	No	The paper does not provide specific train/validation/test dataset split percentages or counts. Reinforcement learning environments are used where data is collected through interaction.
Hardware Specification	Yes	Both were trained on a P100 GPU and had 8 CPUs collecting data; STEVE-DPPG additionally used a second P100 to learn a model in parallel.
Software Dependencies	No	The paper states 'All algorithms were implemented in Tensorﬂow [1]' but does not provide a specific version number for TensorFlow or any other software dependencies.
Experiment Setup	No	For hyperparameters and additional implementation details, please see Appendix C.