Model-based Reinforcement Learning and the Eluder Dimension

Authors: Ian Osband, Benjamin Van Roy

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We characterize this dependence explicitly as O(Ôd Kd ET) where T is time elapsed, d K is the Kolmogorov dimension and d E is the eluder dimension. These represent the first unified regret bounds for model-based reinforcement learning and provide state of the art guarantees in several important settings. Moreover, we present a simple and computationally efficient algorithm posterior sampling for reinforcement learning (PSRL) that satisfies these bounds.
Researcher Affiliation Academia Ian Osband Stanford University iosband@stanford.edu Benjamin Van Roy Stanford University bvr@stanford.edu
Pseudocode Yes Algorithm 1 Posterior Sampling for Reinforcement Learning (PSRL)
Open Source Code No The paper does not include an unambiguous statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No This is a theoretical paper that focuses on mathematical analysis and proofs of regret bounds; it does not involve empirical training on specific datasets, nor does it mention any publicly available datasets.
Dataset Splits No This is a theoretical paper and does not describe empirical experiments with training, validation, or test dataset splits.
Hardware Specification No The paper focuses on theoretical analysis and does not describe any experimental setup or mention specific hardware used for computations.
Software Dependencies No The paper focuses on theoretical analysis and does not describe software dependencies with version numbers needed for replication.
Experiment Setup No The paper is theoretical and presents mathematical proofs and analysis of an algorithm; it does not include details about an experimental setup, hyperparameters, or training configurations.