Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Showing Your Offline Reinforcement Learning Work: Online Evaluation Budget Matters

Authors: Vladislav Kurenkov, Sergey Kolesnikov

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	First, we delineate that the online evaluation budget is problemdependent, where some problems allow for less but others for more. And second, we demonstrate that the preference between algorithms is budgetdependent across a diverse range of decisionmaking domains such as Robotics, Finance, and Energy Management. Following the points above, we suggest reporting the performance of deep offline RL algorithms under varying online evaluation budgets.
Researcher Affiliation	Industry	1Tinkoff, Moscow, Russia. Correspondence to: Vladislav Kurenkov <EMAIL>.
Pseudocode	No	The paper does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Code is available at tinkoff-ai.github.io/eop
Open Datasets	Yes	Fu, J., Kumar, A., Nachum, O., Tucker, G., and Levine, S. D4RL: Datasets for Deep Data-Driven Reinforcement Learning. arXiv:2004.07219 [cs, stat], February 2021a. Gulcehre, C., Wang, Z., Novikov, A., Paine, T. L., Colmenarejo, S. G., Zolna, K., Agarwal, R., Merel, J., Mankowitz, D., Paduraru, C., Dulac-Arnold, G., Li, J., Norouzi, M., Hoffman, M., Nachum, O., Tucker, G., Heess, N., and de Freitas, N. RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning. arXiv:2006.13888 [cs, stat], February 2021.
Dataset Splits	Yes	First, the dataset D is randomly split trajectory-wise into training DT and validation DV subsets accordingly.
Hardware Specification	Yes	The experiments were run on a computational cluster with 14x NVIDIA Tesla V100, 256GB RAM, and Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz (72 cores) for 13 days.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The hyperparameter grids were deferred to the Appendix B.2. The exact hyperparameter space can be found in Table 5.