Provably (More) Sample-Efficient Offline RL with Options
Authors: Xiaoyan Hu, Ho-fung Leung
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we provide the first analysis of the sample complexity for offline RL with options. We propose the PEssimistic Value Iteration for Learning with Options (PEVIO) algorithm and establish near-optimal suboptimality bounds (with respect to the novel information-theoretic lower bound for offline RL with options) for two popular data-collection procedures, where the first one collects state-option transitions and the second one collects state-action transitions. We show that compared to offline RL with actions, using options not only enjoys a faster finite-time convergence rate (to the optimal value) but also attains a better performance (when either the options are carefully designed or the offline data is limited). |
| Researcher Affiliation | Academia | Xiaoyan Hu Department of Computer Science and Engineering The Chinese University of Hong Kong Hong Kong SAR, China xyhu21@cse.cuhk.edu.hk Ho-fung Leung Independent Researcher Hong Kong SAR, China ho-fung.leung@outlook.com |
| Pseudocode | Yes | Algorithm 1 PEssimistic Value Iteration for Learning with Options (PEVIO) Subroutine 2 Offline Option Evaluation (OOE) for Dataset D1 Subroutine 3 Offline Option Evaluation (OOE) for Dataset D2 |
| Open Source Code | No | The paper does not provide any statements about open-sourcing code for the methodology or include links to a code repository. |
| Open Datasets | No | The paper describes hypothetical datasets D1 and D2 based on data collection procedures ('collected by an experimenter', 'collected by executing a behavior policy') but does not name or provide access information for any specific publicly available dataset. |
| Dataset Splits | No | The paper is theoretical and discusses data collection procedures without specifying actual datasets or their split details (e.g., percentages, sample counts, or predefined splits) for reproduction. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup requiring specific hardware. |
| Software Dependencies | No | The paper is theoretical and does not describe any experimental setup requiring specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any experimental setup with hyperparameters or training configurations. |