Is Pessimism Provably Efficient for Offline RL?
Authors: Ying Jin, Zhuoran Yang, Zhaoran Wang
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we propose a pessimistic variant of the value iteration algorithm (PEVI), which incorporates an uncertainty quantifier as the penalty function. ... We establish a data-dependent upper bound on the suboptimality of PEVI for general Markov decision processes (MDPs). When specialized to linear MDPs, it matches the information-theoretic lower bound up to multiplicative factors of the dimension and horizon. In other words, pessimism is not only provably efficient but also minimax optimal. ... Our theoretical contribution is fourfold: (i) We decompose the suboptimality... (ii) For any general MDP, we establish the suboptimality of PEVI... (iii) For the linear MDP... we establish the suboptimality of PEVI... (iv) We prove PEVI is minimax optimal... |
| Researcher Affiliation | Academia | 1Department of Statistics, Stanford University 2Department of Operations Research and Financial Engineering, Princeton University 3Department of Industrial Engineering and Management Sciences, Northwestern University |
| Pseudocode | Yes | Algorithm 1 Pessimistic Value Iteration (PEVI): General MDP; Algorithm 2 Pessimistic Value Iteration (PEVI): Linear MDP |
| Open Source Code | No | The paper does not contain any explicit statements about providing source code for the described methodology, nor does it include links to a code repository. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm design and proofs; it does not describe experiments using a specific, publicly available dataset. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical data analysis; therefore, it does not specify dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not discuss empirical experiments; therefore, it does not mention any hardware specifications used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not discuss empirical experiments; therefore, it does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe empirical experiments; therefore, it does not provide details on experimental setup such as hyperparameters or training configurations. |