Hybrid RL: Using both offline and online data can make RL efficient
Authors: Yuda Song, Yifei Zhou, Ayush Sekhari, Drew Bagnell, Akshay Krishnamurthy, Wen Sun
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we discuss empirical results comparing Hy-Q to several representative RL methods on two challenging benchmarks. |
| Researcher Affiliation | Collaboration | Yuda Song Carnegie Mellon University Yifei Zhou Cornell University Ayush Sekhari MIT J. Andrew Bagnell Carnegie Mellon University Akshay Krishnamurthy Microsoft Research Wen Sun Cornell University |
| Pseudocode | Yes | Algorithm 1 Hybrid Q-learning using both offline and online data (Hy-Q) |
| Open Source Code | Yes | We also open source our code at https://github.com/yudasong/Hy Q. |
| Open Datasets | No | Our (offline) dataset can be reproduced with the attached instructions, and our results could be reproduced with the given random seeds. |
| Dataset Splits | No | The paper does not provide specific percentages or counts for training, validation, or test dataset splits. It discusses training and evaluation but not explicit data partitioning. |
| Hardware Specification | Yes | We run our experiments on a cluster of computes with Nvidia RTX 3090 GPUs and various CPUs which do not incur any randomness to the results. |
| Software Dependencies | No | The paper mentions tools like 'Adam' optimizer and implies 'PyTorch' use through a GitHub link, but it does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | We provide the hyperparameters of Hy-Q in Table. 1. In addition, we provide the hyperparameters we tried for CQL baseline in Table. 2. |