Provably Efficient Offline Reinforcement Learning in Regular Decision Processes
Authors: Roberto Cipollone, Anders Jonsson, Alessandro Ronca, Mohammad Sadegh Talebi
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). ... We present Reg ORL, an algorithm that suitably combines automata learning techniques and state-of-the-art algorithms for offline RL in MDPs. ... We report a non-asymptotic high-probability sample complexity bound for Reg ORL to yield an ε-optimal policy, which makes appear a notion of concentrability relevant for RDPs. Furthermore, we present a sample complexity lower bound for offline RL in RDPs. To our best knowledge, this is the first work presenting a provably efficient algorithm for offline learning in RDPs. |
| Researcher Affiliation | Academia | Roberto Cipollone Sapienza University of Rome cipollone@diag.uniroma1.it Anders Jonsson Universitat Pompeu Fabra anders.jonsson@upf.edu Alessandro Ronca University of Oxford alessandro.ronca@cs.ox.ac.uk Mohammad Sadegh Talebi University of Copenhagen m.shahi@di.ku.dk |
| Pseudocode | Yes | Algorithm 1: Full procedure (Reg ORL) |
| Open Source Code | No | The paper does not provide any statement about releasing open-source code or a link to a code repository. |
| Open Datasets | No | The paper discusses a theoretical setting where a "batch dataset D collected through interacting with an unknown (but fixed) episodic RDP R" is assumed. However, it does not specify any named, publicly available dataset used for empirical training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical experiments, so it does not discuss training, validation, or test dataset splits. |
| Hardware Specification | No | The paper describes theoretical algorithms and proofs, but does not mention any specific hardware used for experiments as no experiments are conducted. |
| Software Dependencies | No | The paper describes theoretical algorithms and proofs, but does not specify any software dependencies with version numbers, as no empirical implementation details are provided. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and sample complexity bounds. It does not include details of an experimental setup or hyperparameters, as no empirical experiments are described. |