Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?

Authors: Qiwen Cui, Lin Yang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This work considers sample complexity of finding an ϵ-optimal policy in a Markov decision process (MDP) that admits a linear additive feature representation, given only access to a generative model. We solve this problem via a plug-in solver approach, which builds an empirical model and plans in this empirical model via an arbitrary plug-in solver. We prove that under the anchor-state assumption, which implies implicit non-negativity in the feature space, the minimax sample complexity of finding an ϵ-optimal policy in a γ-discounted MDP is O(K/(1 γ)3ϵ2), which only depends on the dimensionality K of the feature space and has no dependence on the state or action space.
Researcher Affiliation Academia Qiwen Cui School of Mathematical Science, Peking University cuiqiwen@pku.edu.cn Lin F. Yang Electrical and Computer Engineering Department, University of California, Los Angles linyang@ee.ucla.edu
Pseudocode Yes Algorithm 1: Plug-in Solver Based Reinforcement Learning
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets No The paper is theoretical and does not describe training on specific publicly available datasets. It discusses 'generative models' for sampling, not external datasets.
Dataset Splits No The paper does not mention training, validation, or test dataset splits.
Hardware Specification No The paper is theoretical and does not describe the hardware used for experiments.
Software Dependencies No The paper is theoretical and does not list specific software dependencies with version numbers for experimental reproducibility.
Experiment Setup No The paper is theoretical and does not describe an experimental setup including hyperparameters or system-level training settings.