Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?
Authors: Qiwen Cui, Lin Yang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This work considers sample complexity of finding an ϵ-optimal policy in a Markov decision process (MDP) that admits a linear additive feature representation, given only access to a generative model. We solve this problem via a plug-in solver approach, which builds an empirical model and plans in this empirical model via an arbitrary plug-in solver. We prove that under the anchor-state assumption, which implies implicit non-negativity in the feature space, the minimax sample complexity of finding an ϵ-optimal policy in a γ-discounted MDP is O(K/(1 γ)3ϵ2), which only depends on the dimensionality K of the feature space and has no dependence on the state or action space. |
| Researcher Affiliation | Academia | Qiwen Cui School of Mathematical Science, Peking University cuiqiwen@pku.edu.cn Lin F. Yang Electrical and Computer Engineering Department, University of California, Los Angles linyang@ee.ucla.edu |
| Pseudocode | Yes | Algorithm 1: Plug-in Solver Based Reinforcement Learning |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not describe training on specific publicly available datasets. It discusses 'generative models' for sampling, not external datasets. |
| Dataset Splits | No | The paper does not mention training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe the hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not list specific software dependencies with version numbers for experimental reproducibility. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup including hyperparameters or system-level training settings. |