Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Regret Bounds for Learning State Representations in Reinforcement Learning
Authors: Ronald Ortner, Matteo Pirotta, Alessandro Lazaric, Ronan Fruit, Odalric-Ambrym Maillard
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper we introduce UCB-MS, an optimistic elimination algorithm that performs efficient exploration of the representations. For this algorithm we prove a regret bound of order e O( p|Φ|T). Our algorithm as well as our results are based on and generalize the regret bounds achieved for the UCRL2 algorithm in [9]. In particular, if Φ consists of a single Markov model we obtain the same regret bound as for UCRL2. UCB-MS employs optimism to choose a model from Φ. To avoid suffering too large regret from choosing a non-Markov model, the collected rewards are compared to regret bounds that are known to hold for Markov models. If a model fails to give sufficiently high reward, it is eliminated. On the other hand, UCB-MS is happy to employ a non-Markov model as long as it gives as much reward as it would be expected from a corresponding Markov model. |
| Researcher Affiliation | Collaboration | Ronald Ortner Montanuniversität Leoben EMAIL Matteo Pirotta Facebook AI Research EMAIL Ronan Fruit Sequel Team INRIA Lille EMAIL Alessandro Lazaric Facebook AI Research EMAIL Odalric-Ambrym Maillard Sequel Team INRIA Lille EMAIL |
| Pseudocode | Yes | Algorithm 1 UCB-Model Selection (UCB-MS) |
| Open Source Code | No | The paper does not provide any links to open-source code for the described methodology. It discusses the algorithm and its theoretical bounds. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm design and regret bounds in reinforcement learning. It does not describe experiments on specific datasets, so no dataset information is provided. |
| Dataset Splits | No | The paper is theoretical and focuses on algorithm design and regret bounds in reinforcement learning. It does not describe experiments or dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not mention specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical, presenting an algorithm and its regret bounds. It does not describe an experimental setup with hyperparameters or system-level training settings. |