reproducibilityindex.ai

Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?

Authors: Qiwen Cui, Lin Yang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This work considers sample complexity of ﬁnding an ϵ-optimal policy in a Markov decision process (MDP) that admits a linear additive feature representation, given only access to a generative model. We solve this problem via a plug-in solver approach, which builds an empirical model and plans in this empirical model via an arbitrary plug-in solver. We prove that under the anchor-state assumption, which implies implicit non-negativity in the feature space, the minimax sample complexity of ﬁnding an ϵ-optimal policy in a γ-discounted MDP is O(K/(1 γ)3ϵ2), which only depends on the dimensionality K of the feature space and has no dependence on the state or action space.
Researcher Affiliation	Academia	Qiwen Cui School of Mathematical Science, Peking University cuiqiwen@pku.edu.cn Lin F. Yang Electrical and Computer Engineering Department, University of California, Los Angles linyang@ee.ucla.edu
Pseudocode	Yes	Algorithm 1: Plug-in Solver Based Reinforcement Learning
Open Source Code	No	The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets	No	The paper is theoretical and does not describe training on specific publicly available datasets. It discusses 'generative models' for sampling, not external datasets.
Dataset Splits	No	The paper does not mention training, validation, or test dataset splits.
Hardware Specification	No	The paper is theoretical and does not describe the hardware used for experiments.
Software Dependencies	No	The paper is theoretical and does not list specific software dependencies with version numbers for experimental reproducibility.
Experiment Setup	No	The paper is theoretical and does not describe an experimental setup including hyperparameters or system-level training settings.