Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity
Authors: Laixi Shi, Gen Li, Yuting Wei, Yuxin Chen, Yuejie Chi
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | To address this inadequacy, we study a pessimistic variant of Q-learning in the context of finitehorizon Markov decision processes, and characterize its sample complexity under the single-policy concentrability assumption which does not require the full coverage of the state-action space. In addition, a variance-reduced pessimistic Qlearning algorithm is proposed to achieve nearoptimal sample complexity. |
| Researcher Affiliation | Academia | 1Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA 2Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104, USA. |
| Pseudocode | Yes | Algorithm 1 LCB-Q for offline RL, Algorithm 2 Offline LCB-Q-Advantage RL |
| Open Source Code | No | The paper does not contain any statement or link indicating the release of source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not use or specify any publicly available dataset for empirical evaluation. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical data splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not specify any software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any experimental setup, including hyperparameters or system-level training settings. |