reproducibilityindex.ai

Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning

Authors: K. Lakshmanan, Ronald Ortner, Daniil Ryabko

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Regret bounds in this setting usually hold under various assumptions on the structure of the reward and transition function. Here we improve upon this result by using non-parametric kernel density estimation for estimating the transition probability distributions, and obtain regret bounds that depend on the smoothness of the transition probability distributions. In particular, under the assumption that the transition probability functions are smoothly differentiable, the regret bound is shown to be O(T 2 3 ) asymptotically for reinforcement learning in 1-dimensional state space. Finally, we also derive improved regret bounds for higher dimensional state space.
Researcher Affiliation	Academia	K.Lakshmanan LKSHMNAN.K@GMAIL.COM Montanuniversit at Leoben, Franz-Josef-Strasse 18, 8700 Leoben, AUSTRIA Ronald Ortner RORTNER@UNILEOBEN.AC.AT Montanuniversit at Leoben, Franz-Josef-Strasse 18, 8700 Leoben, AUSTRIA Daniil Ryabko DANIIL@RYABKO.NET INRIA Lille Nord Europe, 40 Avenue Halley, 59650 Villeneuve d Ascq, FRANCE
Pseudocode	Yes	Algorithm 1 UCCRL-Kernel Density Algorithm
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper is theoretical and operates within a defined state space "[0, 1]" for its Markov Decision Process (MDP) model. It does not use or provide access to a specific dataset for training.
Dataset Splits	No	This is a theoretical paper and does not involve empirical experiments with dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and focuses on mathematical derivations and algorithm design. It does not describe any empirical experiments, and therefore, no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and describes an algorithm (UCCRL-KD) and its regret bounds. It does not specify any software dependencies with version numbers required to replicate empirical results.
Experiment Setup	No	The paper is theoretical, describing an algorithm and deriving regret bounds. It does not include details about an experimental setup, hyperparameters, or training configurations.