Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
Authors: Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach outperforms model-free baselines on challenging continuous control benchmarks with an order-of-magnitude increase in sample efficiency, and in contrast to previous model-based approaches, performance does not degrade in complex environments.We systematically evaluate STEVE on several challenging continuous control benchmarks and demonstrate that STEVE significantly outperforms model-free baselines with an order-of-magnitude increase in sample efficiency. |
| Researcher Affiliation | Industry | Google Brain, Mountain View, CA, USA jacobbuckman@gmail.com, mail@danijar.com, {gjt,ebrevdo,honglak}@google.com |
| Pseudocode | No | The paper includes a section titled 'Algorithm' but it describes the algorithm in text rather than providing a structured pseudocode block. |
| Open Source Code | Yes | Our code is available open-source at: https://github.com/tensorflow/models/tree/master/ research/steve |
| Open Datasets | Yes | We evaluated STEVE on a variety of continuous control tasks [3, 19]; we plot learning curves in Figure 3. ... [3] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. Openai gym. ar Xiv preprint ar Xiv:1606.01540, 2016. [19] O. Klimov and J. Schulman. Roboschool. https://github.com/openai/roboschool. |
| Dataset Splits | No | The paper does not provide specific train/validation/test dataset split percentages or counts. Reinforcement learning environments are used where data is collected through interaction. |
| Hardware Specification | Yes | Both were trained on a P100 GPU and had 8 CPUs collecting data; STEVE-DPPG additionally used a second P100 to learn a model in parallel. |
| Software Dependencies | No | The paper states 'All algorithms were implemented in Tensorflow [1]' but does not provide a specific version number for TensorFlow or any other software dependencies. |
| Experiment Setup | No | For hyperparameters and additional implementation details, please see Appendix C. |