Near-optimal Reinforcement Learning in Factored MDPs

Authors: Ian Osband, Benjamin Van Roy

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Our focus in this paper is upon the statistical aspect of the learning problem and like earlier discussions we do not specify which computational methods are used. Our results serve as a reduction of the reinforcement learning problem to finding an approximate solution for a given FMDP.
Researcher Affiliation Academia Ian Osband Stanford University iosband@stanford.edu Benjamin Van Roy Stanford University bvr@stanford.edu
Pseudocode Yes Algorithm 1 PSRL (Posterior Sampling) and Algorithm 2 UCRL-Factored (Optimism)
Open Source Code No Our focus in this paper is upon the statistical aspect of the learning problem and like earlier discussions we do not specify which computational methods are used.
Open Datasets No This paper is theoretical and does not describe experiments involving specific datasets for training or evaluation.
Dataset Splits No The paper is theoretical and does not describe any dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe any hardware specifications used for experiments.
Software Dependencies No The paper is theoretical and does not mention specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe any experimental setup details or hyperparameters.