Provable Reset-free Reinforcement Learning by No-Regret Reduction

Authors: Hoai-An Nguyen, Ching-An Cheng

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our algorithm achieves O(d3H4K) regret and O(d3H4K) resets with high probability, where d is the feature dimension, H is the length of an episode, and K is the total number of episodes. This example serves to ground our abstract framework and to illustrate concretely how an algorithm instantiating our framework might operate. Therefore, we do not make sacrifices in terms of the regret and total number of resets when specializing our abstract framework.
Researcher Affiliation Collaboration 1Microsoft Research 2Rutgers University.
Pseudocode Yes Algorithm 1 Primal-Dual Reset-Free RL Algorithm for Linear MDP with Adaptive Initial States
Open Source Code No The paper does not provide any explicit statements about the availability of open-source code or links to repositories.
Open Datasets No The paper describes a theoretical framework and an algorithm for a 'linear MDP setting' and does not mention the use of any specific publicly available datasets for experimental evaluation.
Dataset Splits No The paper presents a theoretical framework and algorithm, but it does not include empirical experiments on datasets, thus no training, validation, or test splits are provided.
Hardware Specification No The paper focuses on theoretical analysis and algorithm design and does not specify any hardware used for running experiments.
Software Dependencies No The paper presents an algorithm but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, or specific libraries with their versions).
Experiment Setup No The paper describes Algorithm 1 with theoretical parameters (e.g., α and β are defined by expressions involving K, H, B, d, p, rather than specific numeric values), but it does not provide concrete hyperparameter values, training configurations, or other system-level settings for an empirical experimental setup.