Provably Efficient Offline Reinforcement Learning in Regular Decision Processes

Authors: Roberto Cipollone, Anders Jonsson, Alessandro Ronca, Mohammad Sadegh Talebi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). ... We present Reg ORL, an algorithm that suitably combines automata learning techniques and state-of-the-art algorithms for offline RL in MDPs. ... We report a non-asymptotic high-probability sample complexity bound for Reg ORL to yield an ε-optimal policy, which makes appear a notion of concentrability relevant for RDPs. Furthermore, we present a sample complexity lower bound for offline RL in RDPs. To our best knowledge, this is the first work presenting a provably efficient algorithm for offline learning in RDPs.
Researcher Affiliation Academia Roberto Cipollone Sapienza University of Rome cipollone@diag.uniroma1.it Anders Jonsson Universitat Pompeu Fabra anders.jonsson@upf.edu Alessandro Ronca University of Oxford alessandro.ronca@cs.ox.ac.uk Mohammad Sadegh Talebi University of Copenhagen m.shahi@di.ku.dk
Pseudocode Yes Algorithm 1: Full procedure (Reg ORL)
Open Source Code No The paper does not provide any statement about releasing open-source code or a link to a code repository.
Open Datasets No The paper discusses a theoretical setting where a "batch dataset D collected through interacting with an unknown (but fixed) episodic RDP R" is assumed. However, it does not specify any named, publicly available dataset used for empirical training or evaluation.
Dataset Splits No The paper is theoretical and does not conduct empirical experiments, so it does not discuss training, validation, or test dataset splits.
Hardware Specification No The paper describes theoretical algorithms and proofs, but does not mention any specific hardware used for experiments as no experiments are conducted.
Software Dependencies No The paper describes theoretical algorithms and proofs, but does not specify any software dependencies with version numbers, as no empirical implementation details are provided.
Experiment Setup No The paper is theoretical and focuses on algorithm design and sample complexity bounds. It does not include details of an experimental setup or hyperparameters, as no empirical experiments are described.