reproducibilityindex.ai

Leveraging Offline Data in Online Reinforcement Learning

Authors: Andrew Wagenmaker, Aldo Pacchiano

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We characterize the necessary number of online samples needed in this setting given access to some offline dataset, and develop an algorithm, FTPEDEL, which is provably optimal, up to H factors. In addition to introducing the Fine Tune RL setting, we make the following contributions: ... 2. We show there exists an algorithm, FTPEDEL, which, up to lower-order terms, only collects, for each step h, min Ton Ton s.t. Ch o2o(Doff, ϵ, Ton) 1 online episodes the minimal number of online episodes which ensures the offline-to-online concentrability coefficient is sufficiently small and returns a policy that is ϵ-optimal. Furthermore, we show that this complexity is necessary no algorithm can collect fewer online samples and return a policy guaranteed to be ϵ-optimal.
Researcher Affiliation	Collaboration	1University of Washington, Seattle. 2Work done while at Microsoft Research, New York. Current Affiliation: Broad Institute of MIT and Harvard and Boston University.
Pseudocode	Yes	Algorithm 1 Fine-Tuning Policy Learning via Experiment Design in Linear MDPs (FTPEDEL, informal), Algorithm 2 Fine-Tuning Policy Learning via Experiment Design in Linear MDPs (FTPEDEL), Algorithm 3 Online Frank-Wolfe via Regret Minimization (FWREGRET), Algorithm 4 Collect Optimal Covariates (OPTCOV)
Open Source Code	No	The paper does not provide any statement or link regarding the availability of its source code.
Open Datasets	No	This paper is theoretical and does not involve experiments with a specific dataset. It refers to 'offline data' in a conceptual manner as part of its problem definition rather than as a concrete dataset used for training.
Dataset Splits	No	This paper is theoretical and does not involve experiments or dataset splits for validation.
Hardware Specification	No	This paper is theoretical and does not describe any experimental setup or specific hardware used.
Software Dependencies	No	This paper is theoretical and does not describe any experimental setup or specific software dependencies with version numbers.
Experiment Setup	No	This paper is theoretical and does not describe any experimental setup, hyperparameters, or training settings.