reproducibilityindex.ai

The Dependence of Effective Planning Horizon on Model Accuracy

Authors: Nan Jiang, Alex Kulesza, Satinder Singh, Richard Lewis

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We prove a planning loss bound predicting that shorter planning horizons can reduce overﬁtting and improve test performance, and we conﬁrm these predictions empirically. We now show experimentally that the phenomena predicted by the preceding theoretical discussion do, in fact, appear in practice.
Researcher Affiliation	Collaboration	Nan Jiang1 and Alex Kulesza1 and Satinder Singh1 and Richard Lewis2 1Computer Science and Engineering, University of Michigan 2Department of Psychology, University of Michigan nanjiang@umich.edu, kulesza@google.com, baveja@umich.edu, rickl@umich.edu
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any links to open-source code or explicitly state that the code for their method is publicly available.
Open Datasets	No	For these experiments we randomly sampled 1,000 MDPs with 10 states and 2 actions from a distribution we refer to as RANDOM-MDP, deﬁned as follows. For each generated MDP M, and for each value of n 2 {5, 10, 20, 50}, we independently generated 1,000 data sets, each consisting of n trajectories of length 10 starting at uniformly random initial states and choosing uniformly random actions.
Dataset Splits	No	The paper mentions 'Training loss' and 'Test loss' but does not specify a separate validation dataset split with percentages or counts, or a cross-validation setup.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not list any specific software dependencies with version numbers.
Experiment Setup	Yes	For these experiments we randomly sampled 1,000 MDPs with 10 states and 2 actions from a distribution we refer to as RANDOM-MDP, deﬁned as follows. For all MDPs we ﬁxed γeval = 0.99. For each generated MDP M, and for each value of n 2 {5, 10, 20, 50}, we independently generated 1,000 data sets, each consisting of n trajectories of length 10 starting at uniformly random initial states and choosing uniformly random actions. If some (s, a) has never been seen in a dataset, we set b R(s, a) = 0.5 and b T(s, a, s0) = 1/\|S\|. For each value of γ 2 {0, 0.1, 0.2, . . . , 0.9, 0.99}, we compute the empirical loss.