Model-based Reinforcement Learning and the Eluder Dimension
Authors: Ian Osband, Benjamin Van Roy
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We characterize this dependence explicitly as O(Ôd Kd ET) where T is time elapsed, d K is the Kolmogorov dimension and d E is the eluder dimension. These represent the first unified regret bounds for model-based reinforcement learning and provide state of the art guarantees in several important settings. Moreover, we present a simple and computationally efficient algorithm posterior sampling for reinforcement learning (PSRL) that satisfies these bounds. |
| Researcher Affiliation | Academia | Ian Osband Stanford University iosband@stanford.edu Benjamin Van Roy Stanford University bvr@stanford.edu |
| Pseudocode | Yes | Algorithm 1 Posterior Sampling for Reinforcement Learning (PSRL) |
| Open Source Code | No | The paper does not include an unambiguous statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | This is a theoretical paper that focuses on mathematical analysis and proofs of regret bounds; it does not involve empirical training on specific datasets, nor does it mention any publicly available datasets. |
| Dataset Splits | No | This is a theoretical paper and does not describe empirical experiments with training, validation, or test dataset splits. |
| Hardware Specification | No | The paper focuses on theoretical analysis and does not describe any experimental setup or mention specific hardware used for computations. |
| Software Dependencies | No | The paper focuses on theoretical analysis and does not describe software dependencies with version numbers needed for replication. |
| Experiment Setup | No | The paper is theoretical and presents mathematical proofs and analysis of an algorithm; it does not include details about an experimental setup, hyperparameters, or training configurations. |