reproducibilityindex.ai

Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model

Authors: Bingyan Wang, Yuling Yan, Jianqing Fan

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper studies sample complexity of both model-based and model-free RL under a discounted infinite-horizon MDP with feature-based linear transition model. We establish tight sample complexity bounds for both model-based approaches and Q-learning, which scale linearly with the feature dimension K instead of \|S\| \|A\|, thus considerably reduce the required sample size for large-scale MDPs when K is relatively small. Our results are sharp, and the sample complexity bound for the model-based approach matches the minimax lower bound. [...] We show that a model-based approach (resp. Q-learning) provably learns an ε-optimal policy (resp. Q-function) with high probability as soon as the sample size exceeds the order of K (1 γ)3ε2 (resp. K (1 γ)4ε2 ), up to some logarithmic factor.
Researcher Affiliation	Academia	Bingyan Wang Princeton University bingyanw@princeton.edu Yuling Yan Princeton University yulingy@princeton.edu Jianqing Fan Princeton University jqfan@princeton.edu
Pseudocode	Yes	Algorithm 1 Model-based RL with a generative model [...] Algorithm 2 Vanilla Q-learning for infinite-horizon discounted MDPs
Open Source Code	No	No information found regarding open-source code for the methodology described in the paper.
Open Datasets	No	The paper is theoretical and assumes access to a "generative model" that provides independent samples. It does not mention using or providing access to any specific public or open datasets.
Dataset Splits	No	No information found regarding specific dataset splits for training, validation, or testing, as the paper is theoretical and focuses on sample complexity analysis using a generative model.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud specifications) are mentioned for running experiments. The paper focuses on theoretical analysis.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., libraries, frameworks, or solvers) are mentioned for replicating experiments.
Experiment Setup	No	No specific experimental setup details, such as concrete hyperparameter values or training configurations, are provided. The paper is theoretical and focuses on sample complexity analysis.