Worst-Case Offline Reinforcement Learning with Arbitrary Data Support
Authors: Kohei Miyaguchi
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose a method of offline reinforcement learning (RL) featuring the performance guarantee without any assumptions on the data support. Under such conditions, estimating or optimizing the conventional performance metric is generally infeasible due to the distributional discrepancy between data and target policy distributions. To address this issue, we employ a worst-case policy value as a new metric and constructively show that the sample complexity bound of O(ϵ 2) is attainable without any data-support conditions, where ϵ > 0 is the policy suboptimality in the new metric. Moreover, as the new metric generalizes the conventional one, the algorithm can address standard offline RL tasks without modification. In this context, our sample complexity bound can be seen as a strict improvement on the previous bounds under the single-policy concentrability and the single-policy realizability. |
| Researcher Affiliation | Industry | Kohei Miyaguchi IBM Research Tokyo Tokyo, Japan koheimiyaguchi@gmail.com The author is affiliated with LY Corporation at the time of publication. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. The methods are described mathematically. |
| Open Source Code | No | The paper is theoretical and does not mention providing open-source code for the methodology it describes. |
| Open Datasets | No | The paper is theoretical and does not report empirical experiments using specific datasets. It mentions a conceptual "offline dataset D" but provides no concrete access information or citation for a publicly available dataset. |
| Dataset Splits | No | The paper is theoretical and does not report empirical experiments, thus no training/test/validation dataset splits are specified. |
| Hardware Specification | No | The paper is purely theoretical and does not describe any hardware specifications used for running experiments. |
| Software Dependencies | No | The paper is purely theoretical and does not describe any specific software dependencies with version numbers used for experiments. |
| Experiment Setup | No | The paper is purely theoretical and does not provide details about an experimental setup, such as hyperparameters or training settings. |