Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning
Authors: K. Lakshmanan, Ronald Ortner, Daniil Ryabko
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Regret bounds in this setting usually hold under various assumptions on the structure of the reward and transition function. Here we improve upon this result by using non-parametric kernel density estimation for estimating the transition probability distributions, and obtain regret bounds that depend on the smoothness of the transition probability distributions. In particular, under the assumption that the transition probability functions are smoothly differentiable, the regret bound is shown to be O(T 2 3 ) asymptotically for reinforcement learning in 1-dimensional state space. Finally, we also derive improved regret bounds for higher dimensional state space. |
| Researcher Affiliation | Academia | K.Lakshmanan LKSHMNAN.K@GMAIL.COM Montanuniversit at Leoben, Franz-Josef-Strasse 18, 8700 Leoben, AUSTRIA Ronald Ortner RORTNER@UNILEOBEN.AC.AT Montanuniversit at Leoben, Franz-Josef-Strasse 18, 8700 Leoben, AUSTRIA Daniil Ryabko DANIIL@RYABKO.NET INRIA Lille Nord Europe, 40 Avenue Halley, 59650 Villeneuve d Ascq, FRANCE |
| Pseudocode | Yes | Algorithm 1 UCCRL-Kernel Density Algorithm |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper is theoretical and operates within a defined state space "[0, 1]" for its Markov Decision Process (MDP) model. It does not use or provide access to a specific dataset for training. |
| Dataset Splits | No | This is a theoretical paper and does not involve empirical experiments with dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and focuses on mathematical derivations and algorithm design. It does not describe any empirical experiments, and therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and describes an algorithm (UCCRL-KD) and its regret bounds. It does not specify any software dependencies with version numbers required to replicate empirical results. |
| Experiment Setup | No | The paper is theoretical, describing an algorithm and deriving regret bounds. It does not include details about an experimental setup, hyperparameters, or training configurations. |