Hybrid RL: Using both offline and online data can make RL efficient

Authors: Yuda Song, Yifei Zhou, Ayush Sekhari, Drew Bagnell, Akshay Krishnamurthy, Wen Sun

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we discuss empirical results comparing Hy-Q to several representative RL methods on two challenging benchmarks.
Researcher Affiliation Collaboration Yuda Song Carnegie Mellon University Yifei Zhou Cornell University Ayush Sekhari MIT J. Andrew Bagnell Carnegie Mellon University Akshay Krishnamurthy Microsoft Research Wen Sun Cornell University
Pseudocode Yes Algorithm 1 Hybrid Q-learning using both offline and online data (Hy-Q)
Open Source Code Yes We also open source our code at https://github.com/yudasong/Hy Q.
Open Datasets No Our (offline) dataset can be reproduced with the attached instructions, and our results could be reproduced with the given random seeds.
Dataset Splits No The paper does not provide specific percentages or counts for training, validation, or test dataset splits. It discusses training and evaluation but not explicit data partitioning.
Hardware Specification Yes We run our experiments on a cluster of computes with Nvidia RTX 3090 GPUs and various CPUs which do not incur any randomness to the results.
Software Dependencies No The paper mentions tools like 'Adam' optimizer and implies 'PyTorch' use through a GitHub link, but it does not list specific software dependencies with version numbers.
Experiment Setup Yes We provide the hyperparameters of Hy-Q in Table. 1. In addition, we provide the hyperparameters we tried for CQL baseline in Table. 2.