Provable Reset-free Reinforcement Learning by No-Regret Reduction
Authors: Hoai-An Nguyen, Ching-An Cheng
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our algorithm achieves O(d3H4K) regret and O(d3H4K) resets with high probability, where d is the feature dimension, H is the length of an episode, and K is the total number of episodes. This example serves to ground our abstract framework and to illustrate concretely how an algorithm instantiating our framework might operate. Therefore, we do not make sacrifices in terms of the regret and total number of resets when specializing our abstract framework. |
| Researcher Affiliation | Collaboration | 1Microsoft Research 2Rutgers University. |
| Pseudocode | Yes | Algorithm 1 Primal-Dual Reset-Free RL Algorithm for Linear MDP with Adaptive Initial States |
| Open Source Code | No | The paper does not provide any explicit statements about the availability of open-source code or links to repositories. |
| Open Datasets | No | The paper describes a theoretical framework and an algorithm for a 'linear MDP setting' and does not mention the use of any specific publicly available datasets for experimental evaluation. |
| Dataset Splits | No | The paper presents a theoretical framework and algorithm, but it does not include empirical experiments on datasets, thus no training, validation, or test splits are provided. |
| Hardware Specification | No | The paper focuses on theoretical analysis and algorithm design and does not specify any hardware used for running experiments. |
| Software Dependencies | No | The paper presents an algorithm but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, or specific libraries with their versions). |
| Experiment Setup | No | The paper describes Algorithm 1 with theoretical parameters (e.g., α and β are defined by expressions involving K, H, B, d, p, rather than specific numeric values), but it does not provide concrete hyperparameter values, training configurations, or other system-level settings for an empirical experimental setup. |