Showing Your Offline Reinforcement Learning Work: Online Evaluation Budget Matters
Authors: Vladislav Kurenkov, Sergey Kolesnikov
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | First, we delineate that the online evaluation budget is problemdependent, where some problems allow for less but others for more. And second, we demonstrate that the preference between algorithms is budgetdependent across a diverse range of decisionmaking domains such as Robotics, Finance, and Energy Management. Following the points above, we suggest reporting the performance of deep offline RL algorithms under varying online evaluation budgets. |
| Researcher Affiliation | Industry | 1Tinkoff, Moscow, Russia. Correspondence to: Vladislav Kurenkov <v.kurenkov@tinkoff.ai>. |
| Pseudocode | No | The paper does not include any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Code is available at tinkoff-ai.github.io/eop |
| Open Datasets | Yes | Fu, J., Kumar, A., Nachum, O., Tucker, G., and Levine, S. D4RL: Datasets for Deep Data-Driven Reinforcement Learning. arXiv:2004.07219 [cs, stat], February 2021a. Gulcehre, C., Wang, Z., Novikov, A., Paine, T. L., Colmenarejo, S. G., Zolna, K., Agarwal, R., Merel, J., Mankowitz, D., Paduraru, C., Dulac-Arnold, G., Li, J., Norouzi, M., Hoffman, M., Nachum, O., Tucker, G., Heess, N., and de Freitas, N. RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning. arXiv:2006.13888 [cs, stat], February 2021. |
| Dataset Splits | Yes | First, the dataset D is randomly split trajectory-wise into training DT and validation DV subsets accordingly. |
| Hardware Specification | Yes | The experiments were run on a computational cluster with 14x NVIDIA Tesla V100, 256GB RAM, and Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz (72 cores) for 13 days. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The hyperparameter grids were deferred to the Appendix B.2. The exact hyperparameter space can be found in Table 5. |