Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning
Authors: Gen Li, Wenhao Zhan, Jason D. Lee, Yuejie Chi, Yuxin Chen
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our theory is developed based on a new notion called single-policy partial concentrability, which captures the trade-off between distribution mismatch and miscoverage and guides the interplay between offline and online data. [...] We characterize the sample complexity of our algorithm (see Theorem 1), which provably improves upon both pure online and offline RL. |
| Researcher Affiliation | Academia | Gen Li CUHK Wenhao Zhan Princeton Jason D. Lee Princeton Yuejie Chi CMU Yuxin Chen UPenn |
| Pseudocode | Yes | Due to space limitation, the pseudocode of the complete algorithm is provided in Appendix B. [...] Algorithm 1: The proposed hybrid RL algorithm. |
| Open Source Code | No | The paper does not provide any explicit statement or link regarding open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not report on empirical experiments using a specific dataset. |
| Dataset Splits | No | The paper is theoretical and does not report on empirical experiments with dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not specify software dependencies with version numbers for experimental reproduction. |
| Experiment Setup | No | The paper is theoretical and does not describe an empirical experimental setup with specific hyperparameters or training configurations. |