Near-optimal Reinforcement Learning in Factored MDPs
Authors: Ian Osband, Benjamin Van Roy
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our focus in this paper is upon the statistical aspect of the learning problem and like earlier discussions we do not specify which computational methods are used. Our results serve as a reduction of the reinforcement learning problem to finding an approximate solution for a given FMDP. |
| Researcher Affiliation | Academia | Ian Osband Stanford University iosband@stanford.edu Benjamin Van Roy Stanford University bvr@stanford.edu |
| Pseudocode | Yes | Algorithm 1 PSRL (Posterior Sampling) and Algorithm 2 UCRL-Factored (Optimism) |
| Open Source Code | No | Our focus in this paper is upon the statistical aspect of the learning problem and like earlier discussions we do not specify which computational methods are used. |
| Open Datasets | No | This paper is theoretical and does not describe experiments involving specific datasets for training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not describe any dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any hardware specifications used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any experimental setup details or hyperparameters. |