Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Near-optimal Reinforcement Learning in Factored MDPs
Authors: Ian Osband, Benjamin Van Roy
NeurIPS 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our focus in this paper is upon the statistical aspect of the learning problem and like earlier discussions we do not specify which computational methods are used. Our results serve as a reduction of the reinforcement learning problem to ο¬nding an approximate solution for a given FMDP. |
| Researcher Affiliation | Academia | Ian Osband Stanford University EMAIL Benjamin Van Roy Stanford University EMAIL |
| Pseudocode | Yes | Algorithm 1 PSRL (Posterior Sampling) and Algorithm 2 UCRL-Factored (Optimism) |
| Open Source Code | No | Our focus in this paper is upon the statistical aspect of the learning problem and like earlier discussions we do not specify which computational methods are used. |
| Open Datasets | No | This paper is theoretical and does not describe experiments involving specific datasets for training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not describe any dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any hardware specifications used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any experimental setup details or hyperparameters. |