Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model

Authors: Bingyan Wang, Yuling Yan, Jianqing Fan

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper studies sample complexity of both model-based and model-free RL under a discounted infinite-horizon MDP with feature-based linear transition model. We establish tight sample complexity bounds for both model-based approaches and Q-learning, which scale linearly with the feature dimension K instead of |S| |A|, thus considerably reduce the required sample size for large-scale MDPs when K is relatively small. Our results are sharp, and the sample complexity bound for the model-based approach matches the minimax lower bound. [...] We show that a model-based approach (resp. Q-learning) provably learns an ε-optimal policy (resp. Q-function) with high probability as soon as the sample size exceeds the order of K (1 γ)3ε2 (resp. K (1 γ)4ε2 ), up to some logarithmic factor.
Researcher Affiliation Academia Bingyan Wang Princeton University bingyanw@princeton.edu Yuling Yan Princeton University yulingy@princeton.edu Jianqing Fan Princeton University jqfan@princeton.edu
Pseudocode Yes Algorithm 1 Model-based RL with a generative model [...] Algorithm 2 Vanilla Q-learning for infinite-horizon discounted MDPs
Open Source Code No No information found regarding open-source code for the methodology described in the paper.
Open Datasets No The paper is theoretical and assumes access to a "generative model" that provides independent samples. It does not mention using or providing access to any specific public or open datasets.
Dataset Splits No No information found regarding specific dataset splits for training, validation, or testing, as the paper is theoretical and focuses on sample complexity analysis using a generative model.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud specifications) are mentioned for running experiments. The paper focuses on theoretical analysis.
Software Dependencies No No specific software dependencies with version numbers (e.g., libraries, frameworks, or solvers) are mentioned for replicating experiments.
Experiment Setup No No specific experimental setup details, such as concrete hyperparameter values or training configurations, are provided. The paper is theoretical and focuses on sample complexity analysis.