Abstraction Selection in Model-based Reinforcement Learning
Authors: Nan Jiang, Alex Kulesza, Satinder Singh
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Existing approaches have theoretical guarantees only under strong assumptions on the domain or asymptotically large amounts of data, but in this paper we propose a simple algorithm based on statistical hypothesis testing that comes with a finite-sample guarantee under assumptions on candidate abstractions. Our algorithm trades off the low approximation error of finer abstractions against the low estimation error of coarser abstractions, resulting in a loss bound that depends only on the quality of the best available abstraction and is polynomial in planning horizon. |
| Researcher Affiliation | Academia | Nan Jiang NANJIANG@UMICH.EDU Alex Kulesza KULESZA@UMICH.EDU Satinder Singh BAVEJA@UMICH.EDU Computer Science & Engineering, University of Michigan |
| Pseudocode | Yes | Algorithm 1 Compare Pair(D, H, δ) |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code for the methodology described, nor does it provide links to a code repository. |
| Open Datasets | No | The paper is theoretical and refers to a 'dataset D' generically as part of its theoretical model, but it does not specify or use any particular named, publicly available dataset for experiments. |
| Dataset Splits | No | The paper is theoretical and does not conduct empirical experiments, therefore it does not specify training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not mention any specific software names with version numbers that would be required to reproduce experiments. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with specific hyperparameters, training configurations, or system-level settings. |