The Dependence of Effective Planning Horizon on Model Accuracy

Authors: Nan Jiang, Alex Kulesza, Satinder Singh, Richard Lewis

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We prove a planning loss bound predicting that shorter planning horizons can reduce overfitting and improve test performance, and we confirm these predictions empirically. We now show experimentally that the phenomena predicted by the preceding theoretical discussion do, in fact, appear in practice.
Researcher Affiliation Collaboration Nan Jiang1 and Alex Kulesza1 and Satinder Singh1 and Richard Lewis2 1Computer Science and Engineering, University of Michigan 2Department of Psychology, University of Michigan nanjiang@umich.edu, kulesza@google.com, baveja@umich.edu, rickl@umich.edu
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any links to open-source code or explicitly state that the code for their method is publicly available.
Open Datasets No For these experiments we randomly sampled 1,000 MDPs with 10 states and 2 actions from a distribution we refer to as RANDOM-MDP, defined as follows. For each generated MDP M, and for each value of n 2 {5, 10, 20, 50}, we independently generated 1,000 data sets, each consisting of n trajectories of length 10 starting at uniformly random initial states and choosing uniformly random actions.
Dataset Splits No The paper mentions 'Training loss' and 'Test loss' but does not specify a separate validation dataset split with percentages or counts, or a cross-validation setup.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not list any specific software dependencies with version numbers.
Experiment Setup Yes For these experiments we randomly sampled 1,000 MDPs with 10 states and 2 actions from a distribution we refer to as RANDOM-MDP, defined as follows. For all MDPs we fixed γeval = 0.99. For each generated MDP M, and for each value of n 2 {5, 10, 20, 50}, we independently generated 1,000 data sets, each consisting of n trajectories of length 10 starting at uniformly random initial states and choosing uniformly random actions. If some (s, a) has never been seen in a dataset, we set b R(s, a) = 0.5 and b T(s, a, s0) = 1/|S|. For each value of γ 2 {0, 0.1, 0.2, . . . , 0.9, 0.99}, we compute the empirical loss.