reproducibilityindex.ai

Provably Efficient Offline Reinforcement Learning in Regular Decision Processes

Authors: Roberto Cipollone, Anders Jonsson, Alessandro Ronca, Mohammad Sadegh Talebi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). ... We present Reg ORL, an algorithm that suitably combines automata learning techniques and state-of-the-art algorithms for offline RL in MDPs. ... We report a non-asymptotic high-probability sample complexity bound for Reg ORL to yield an ε-optimal policy, which makes appear a notion of concentrability relevant for RDPs. Furthermore, we present a sample complexity lower bound for offline RL in RDPs. To our best knowledge, this is the first work presenting a provably efficient algorithm for offline learning in RDPs.
Researcher Affiliation	Academia	Roberto Cipollone Sapienza University of Rome cipollone@diag.uniroma1.it Anders Jonsson Universitat Pompeu Fabra anders.jonsson@upf.edu Alessandro Ronca University of Oxford alessandro.ronca@cs.ox.ac.uk Mohammad Sadegh Talebi University of Copenhagen m.shahi@di.ku.dk
Pseudocode	Yes	Algorithm 1: Full procedure (Reg ORL)
Open Source Code	No	The paper does not provide any statement about releasing open-source code or a link to a code repository.
Open Datasets	No	The paper discusses a theoretical setting where a "batch dataset D collected through interacting with an unknown (but fixed) episodic RDP R" is assumed. However, it does not specify any named, publicly available dataset used for empirical training or evaluation.
Dataset Splits	No	The paper is theoretical and does not conduct empirical experiments, so it does not discuss training, validation, or test dataset splits.
Hardware Specification	No	The paper describes theoretical algorithms and proofs, but does not mention any specific hardware used for experiments as no experiments are conducted.
Software Dependencies	No	The paper describes theoretical algorithms and proofs, but does not specify any software dependencies with version numbers, as no empirical implementation details are provided.
Experiment Setup	No	The paper is theoretical and focuses on algorithm design and sample complexity bounds. It does not include details of an experimental setup or hyperparameters, as no empirical experiments are described.