reproducibilityindex.ai

Model-based Reinforcement Learning and the Eluder Dimension

Authors: Ian Osband, Benjamin Van Roy

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We characterize this dependence explicitly as O(Ôd Kd ET) where T is time elapsed, d K is the Kolmogorov dimension and d E is the eluder dimension. These represent the ﬁrst uniﬁed regret bounds for model-based reinforcement learning and provide state of the art guarantees in several important settings. Moreover, we present a simple and computationally eﬃcient algorithm posterior sampling for reinforcement learning (PSRL) that satisﬁes these bounds.
Researcher Affiliation	Academia	Ian Osband Stanford University iosband@stanford.edu Benjamin Van Roy Stanford University bvr@stanford.edu
Pseudocode	Yes	Algorithm 1 Posterior Sampling for Reinforcement Learning (PSRL)
Open Source Code	No	The paper does not include an unambiguous statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	This is a theoretical paper that focuses on mathematical analysis and proofs of regret bounds; it does not involve empirical training on specific datasets, nor does it mention any publicly available datasets.
Dataset Splits	No	This is a theoretical paper and does not describe empirical experiments with training, validation, or test dataset splits.
Hardware Specification	No	The paper focuses on theoretical analysis and does not describe any experimental setup or mention specific hardware used for computations.
Software Dependencies	No	The paper focuses on theoretical analysis and does not describe software dependencies with version numbers needed for replication.
Experiment Setup	No	The paper is theoretical and presents mathematical proofs and analysis of an algorithm; it does not include details about an experimental setup, hyperparameters, or training configurations.