reproducibilityindex.ai

Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning

Authors: Gen Li, Wenhao Zhan, Jason D. Lee, Yuejie Chi, Yuxin Chen

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Our theory is developed based on a new notion called single-policy partial concentrability, which captures the trade-off between distribution mismatch and miscoverage and guides the interplay between ofﬂine and online data. [...] We characterize the sample complexity of our algorithm (see Theorem 1), which provably improves upon both pure online and ofﬂine RL.
Researcher Affiliation	Academia	Gen Li CUHK Wenhao Zhan Princeton Jason D. Lee Princeton Yuejie Chi CMU Yuxin Chen UPenn
Pseudocode	Yes	Due to space limitation, the pseudocode of the complete algorithm is provided in Appendix B. [...] Algorithm 1: The proposed hybrid RL algorithm.
Open Source Code	No	The paper does not provide any explicit statement or link regarding open-source code for the described methodology.
Open Datasets	No	The paper is theoretical and does not report on empirical experiments using a specific dataset.
Dataset Splits	No	The paper is theoretical and does not report on empirical experiments with dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and does not describe any specific hardware used for running experiments.
Software Dependencies	No	The paper is theoretical and does not specify software dependencies with version numbers for experimental reproduction.
Experiment Setup	No	The paper is theoretical and does not describe an empirical experimental setup with specific hyperparameters or training configurations.