Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model
Authors: Bingyan Wang, Yuling Yan, Jianqing Fan
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper studies sample complexity of both model-based and model-free RL under a discounted infinite-horizon MDP with feature-based linear transition model. We establish tight sample complexity bounds for both model-based approaches and Q-learning, which scale linearly with the feature dimension K instead of |S| |A|, thus considerably reduce the required sample size for large-scale MDPs when K is relatively small. Our results are sharp, and the sample complexity bound for the model-based approach matches the minimax lower bound. [...] We show that a model-based approach (resp. Q-learning) provably learns an ε-optimal policy (resp. Q-function) with high probability as soon as the sample size exceeds the order of K (1 γ)3ε2 (resp. K (1 γ)4ε2 ), up to some logarithmic factor. |
| Researcher Affiliation | Academia | Bingyan Wang Princeton University bingyanw@princeton.edu Yuling Yan Princeton University yulingy@princeton.edu Jianqing Fan Princeton University jqfan@princeton.edu |
| Pseudocode | Yes | Algorithm 1 Model-based RL with a generative model [...] Algorithm 2 Vanilla Q-learning for infinite-horizon discounted MDPs |
| Open Source Code | No | No information found regarding open-source code for the methodology described in the paper. |
| Open Datasets | No | The paper is theoretical and assumes access to a "generative model" that provides independent samples. It does not mention using or providing access to any specific public or open datasets. |
| Dataset Splits | No | No information found regarding specific dataset splits for training, validation, or testing, as the paper is theoretical and focuses on sample complexity analysis using a generative model. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud specifications) are mentioned for running experiments. The paper focuses on theoretical analysis. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., libraries, frameworks, or solvers) are mentioned for replicating experiments. |
| Experiment Setup | No | No specific experimental setup details, such as concrete hyperparameter values or training configurations, are provided. The paper is theoretical and focuses on sample complexity analysis. |