Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Hybrid RL: Using both offline and online data can make RL efficient
Authors: Yuda Song, Yifei Zhou, Ayush Sekhari, Drew Bagnell, Akshay Krishnamurthy, Wen Sun
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we discuss empirical results comparing Hy-Q to several representative RL methods on two challenging benchmarks. |
| Researcher Affiliation | Collaboration | Yuda Song Carnegie Mellon University Yifei Zhou Cornell University Ayush Sekhari MIT J. Andrew Bagnell Carnegie Mellon University Akshay Krishnamurthy Microsoft Research Wen Sun Cornell University |
| Pseudocode | Yes | Algorithm 1 Hybrid Q-learning using both offline and online data (Hy-Q) |
| Open Source Code | Yes | We also open source our code at https://github.com/yudasong/Hy Q. |
| Open Datasets | No | Our (offline) dataset can be reproduced with the attached instructions, and our results could be reproduced with the given random seeds. |
| Dataset Splits | No | The paper does not provide specific percentages or counts for training, validation, or test dataset splits. It discusses training and evaluation but not explicit data partitioning. |
| Hardware Specification | Yes | We run our experiments on a cluster of computes with Nvidia RTX 3090 GPUs and various CPUs which do not incur any randomness to the results. |
| Software Dependencies | No | The paper mentions tools like 'Adam' optimizer and implies 'PyTorch' use through a GitHub link, but it does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | We provide the hyperparameters of Hy-Q in Table. 1. In addition, we provide the hyperparameters we tried for CQL baseline in Table. 2. |