The Dependence of Effective Planning Horizon on Model Accuracy
Authors: Nan Jiang, Alex Kulesza, Satinder Singh, Richard Lewis
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove a planning loss bound predicting that shorter planning horizons can reduce overfitting and improve test performance, and we confirm these predictions empirically. We now show experimentally that the phenomena predicted by the preceding theoretical discussion do, in fact, appear in practice. |
| Researcher Affiliation | Collaboration | Nan Jiang1 and Alex Kulesza1 and Satinder Singh1 and Richard Lewis2 1Computer Science and Engineering, University of Michigan 2Department of Psychology, University of Michigan nanjiang@umich.edu, kulesza@google.com, baveja@umich.edu, rickl@umich.edu |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any links to open-source code or explicitly state that the code for their method is publicly available. |
| Open Datasets | No | For these experiments we randomly sampled 1,000 MDPs with 10 states and 2 actions from a distribution we refer to as RANDOM-MDP, defined as follows. For each generated MDP M, and for each value of n 2 {5, 10, 20, 50}, we independently generated 1,000 data sets, each consisting of n trajectories of length 10 starting at uniformly random initial states and choosing uniformly random actions. |
| Dataset Splits | No | The paper mentions 'Training loss' and 'Test loss' but does not specify a separate validation dataset split with percentages or counts, or a cross-validation setup. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers. |
| Experiment Setup | Yes | For these experiments we randomly sampled 1,000 MDPs with 10 states and 2 actions from a distribution we refer to as RANDOM-MDP, defined as follows. For all MDPs we fixed γeval = 0.99. For each generated MDP M, and for each value of n 2 {5, 10, 20, 50}, we independently generated 1,000 data sets, each consisting of n trajectories of length 10 starting at uniformly random initial states and choosing uniformly random actions. If some (s, a) has never been seen in a dataset, we set b R(s, a) = 0.5 and b T(s, a, s0) = 1/|S|. For each value of γ 2 {0, 0.1, 0.2, . . . , 0.9, 0.99}, we compute the empirical loss. |