reproducibilityindex.ai

Provable Reset-free Reinforcement Learning by No-Regret Reduction

Authors: Hoai-An Nguyen, Ching-An Cheng

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Our algorithm achieves O(d3H4K) regret and O(d3H4K) resets with high probability, where d is the feature dimension, H is the length of an episode, and K is the total number of episodes. This example serves to ground our abstract framework and to illustrate concretely how an algorithm instantiating our framework might operate. Therefore, we do not make sacrifices in terms of the regret and total number of resets when specializing our abstract framework.
Researcher Affiliation	Collaboration	1Microsoft Research 2Rutgers University.
Pseudocode	Yes	Algorithm 1 Primal-Dual Reset-Free RL Algorithm for Linear MDP with Adaptive Initial States
Open Source Code	No	The paper does not provide any explicit statements about the availability of open-source code or links to repositories.
Open Datasets	No	The paper describes a theoretical framework and an algorithm for a 'linear MDP setting' and does not mention the use of any specific publicly available datasets for experimental evaluation.
Dataset Splits	No	The paper presents a theoretical framework and algorithm, but it does not include empirical experiments on datasets, thus no training, validation, or test splits are provided.
Hardware Specification	No	The paper focuses on theoretical analysis and algorithm design and does not specify any hardware used for running experiments.
Software Dependencies	No	The paper presents an algorithm but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, or specific libraries with their versions).
Experiment Setup	No	The paper describes Algorithm 1 with theoretical parameters (e.g., α and β are defined by expressions involving K, H, B, d, p, rather than specific numeric values), but it does not provide concrete hyperparameter values, training configurations, or other system-level settings for an empirical experimental setup.