Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Provably Efficient Offline Reinforcement Learning in Regular Decision Processes
Authors: Roberto Cipollone, Anders Jonsson, Alessandro Ronca, Mohammad Sadegh Talebi
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). ... We present Reg ORL, an algorithm that suitably combines automata learning techniques and state-of-the-art algorithms for offline RL in MDPs. ... We report a non-asymptotic high-probability sample complexity bound for Reg ORL to yield an ε-optimal policy, which makes appear a notion of concentrability relevant for RDPs. Furthermore, we present a sample complexity lower bound for offline RL in RDPs. To our best knowledge, this is the first work presenting a provably efficient algorithm for offline learning in RDPs. |
| Researcher Affiliation | Academia | Roberto Cipollone Sapienza University of Rome EMAIL Anders Jonsson Universitat Pompeu Fabra EMAIL Alessandro Ronca University of Oxford EMAIL Mohammad Sadegh Talebi University of Copenhagen EMAIL |
| Pseudocode | Yes | Algorithm 1: Full procedure (Reg ORL) |
| Open Source Code | No | The paper does not provide any statement about releasing open-source code or a link to a code repository. |
| Open Datasets | No | The paper discusses a theoretical setting where a "batch dataset D collected through interacting with an unknown (but fixed) episodic RDP R" is assumed. However, it does not specify any named, publicly available dataset used for empirical training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical experiments, so it does not discuss training, validation, or test dataset splits. |
| Hardware Specification | No | The paper describes theoretical algorithms and proofs, but does not mention any specific hardware used for experiments as no experiments are conducted. |
| Software Dependencies | No | The paper describes theoretical algorithms and proofs, but does not specify any software dependencies with version numbers, as no empirical implementation details are provided. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and sample complexity bounds. It does not include details of an experimental setup or hyperparameters, as no empirical experiments are described. |