Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning

Authors: Gen Li, Wenhao Zhan, Jason D. Lee, Yuejie Chi, Yuxin Chen

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our theory is developed based on a new notion called single-policy partial concentrability, which captures the trade-off between distribution mismatch and miscoverage and guides the interplay between offline and online data. [...] We characterize the sample complexity of our algorithm (see Theorem 1), which provably improves upon both pure online and offline RL.
Researcher Affiliation Academia Gen Li CUHK Wenhao Zhan Princeton Jason D. Lee Princeton Yuejie Chi CMU Yuxin Chen UPenn
Pseudocode Yes Due to space limitation, the pseudocode of the complete algorithm is provided in Appendix B. [...] Algorithm 1: The proposed hybrid RL algorithm.
Open Source Code No The paper does not provide any explicit statement or link regarding open-source code for the described methodology.
Open Datasets No The paper is theoretical and does not report on empirical experiments using a specific dataset.
Dataset Splits No The paper is theoretical and does not report on empirical experiments with dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe any specific hardware used for running experiments.
Software Dependencies No The paper is theoretical and does not specify software dependencies with version numbers for experimental reproduction.
Experiment Setup No The paper is theoretical and does not describe an empirical experimental setup with specific hyperparameters or training configurations.