Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning
Authors: Gen Li, Wenhao Zhan, Jason D. Lee, Yuejie Chi, Yuxin Chen
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our theory is developed based on a new notion called single-policy partial concentrability, which captures the trade-off between distribution mismatch and miscoverage and guides the interplay between offline and online data. [...] We characterize the sample complexity of our algorithm (see Theorem 1), which provably improves upon both pure online and offline RL. |
| Researcher Affiliation | Academia | Gen Li CUHK Wenhao Zhan Princeton Jason D. Lee Princeton Yuejie Chi CMU Yuxin Chen UPenn |
| Pseudocode | Yes | Due to space limitation, the pseudocode of the complete algorithm is provided in Appendix B. [...] Algorithm 1: The proposed hybrid RL algorithm. |
| Open Source Code | No | The paper does not provide any explicit statement or link regarding open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not report on empirical experiments using a specific dataset. |
| Dataset Splits | No | The paper is theoretical and does not report on empirical experiments with dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not specify software dependencies with version numbers for experimental reproduction. |
| Experiment Setup | No | The paper is theoretical and does not describe an empirical experimental setup with specific hyperparameters or training configurations. |