Showing Your Offline Reinforcement Learning Work: Online Evaluation Budget Matters

Authors: Vladislav Kurenkov, Sergey Kolesnikov

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental First, we delineate that the online evaluation budget is problemdependent, where some problems allow for less but others for more. And second, we demonstrate that the preference between algorithms is budgetdependent across a diverse range of decisionmaking domains such as Robotics, Finance, and Energy Management. Following the points above, we suggest reporting the performance of deep offline RL algorithms under varying online evaluation budgets.
Researcher Affiliation Industry 1Tinkoff, Moscow, Russia. Correspondence to: Vladislav Kurenkov <v.kurenkov@tinkoff.ai>.
Pseudocode No The paper does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Code is available at tinkoff-ai.github.io/eop
Open Datasets Yes Fu, J., Kumar, A., Nachum, O., Tucker, G., and Levine, S. D4RL: Datasets for Deep Data-Driven Reinforcement Learning. arXiv:2004.07219 [cs, stat], February 2021a. Gulcehre, C., Wang, Z., Novikov, A., Paine, T. L., Colmenarejo, S. G., Zolna, K., Agarwal, R., Merel, J., Mankowitz, D., Paduraru, C., Dulac-Arnold, G., Li, J., Norouzi, M., Hoffman, M., Nachum, O., Tucker, G., Heess, N., and de Freitas, N. RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning. arXiv:2006.13888 [cs, stat], February 2021.
Dataset Splits Yes First, the dataset D is randomly split trajectory-wise into training DT and validation DV subsets accordingly.
Hardware Specification Yes The experiments were run on a computational cluster with 14x NVIDIA Tesla V100, 256GB RAM, and Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz (72 cores) for 13 days.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The hyperparameter grids were deferred to the Appendix B.2. The exact hyperparameter space can be found in Table 5.