Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning

Authors: K. Lakshmanan, Ronald Ortner, Daniil Ryabko

ICML 2015 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Regret bounds in this setting usually hold under various assumptions on the structure of the reward and transition function. Here we improve upon this result by using non-parametric kernel density estimation for estimating the transition probability distributions, and obtain regret bounds that depend on the smoothness of the transition probability distributions. In particular, under the assumption that the transition probability functions are smoothly differentiable, the regret bound is shown to be O(T 2 3 ) asymptotically for reinforcement learning in 1-dimensional state space. Finally, we also derive improved regret bounds for higher dimensional state space.
Researcher Affiliation Academia K.Lakshmanan EMAIL Montanuniversit at Leoben, Franz-Josef-Strasse 18, 8700 Leoben, AUSTRIA Ronald Ortner EMAIL Montanuniversit at Leoben, Franz-Josef-Strasse 18, 8700 Leoben, AUSTRIA Daniil Ryabko EMAIL INRIA Lille Nord Europe, 40 Avenue Halley, 59650 Villeneuve d Ascq, FRANCE
Pseudocode Yes Algorithm 1 UCCRL-Kernel Density Algorithm
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper is theoretical and operates within a defined state space "[0, 1]" for its Markov Decision Process (MDP) model. It does not use or provide access to a specific dataset for training.
Dataset Splits No This is a theoretical paper and does not involve empirical experiments with dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and focuses on mathematical derivations and algorithm design. It does not describe any empirical experiments, and therefore, no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and describes an algorithm (UCCRL-KD) and its regret bounds. It does not specify any software dependencies with version numbers required to replicate empirical results.
Experiment Setup No The paper is theoretical, describing an algorithm and deriving regret bounds. It does not include details about an experimental setup, hyperparameters, or training configurations.