reproducibilityindex.ai

Agnostic System Identification for Monte Carlo Planning

Authors: Erik Talvitie

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	A simple empirical illustration (Section 5) demonstrates that these issues can arise in practice and that DAgger-MC can perform well in cases where DAgger fails. ... 5 Experiments Consider the simple game, Shooter, which is shown in Figure 3. ... The results of applying DAgger and DAgger-MC to this problem (using various expert policies to generate ν) are shown in Figure 2. ... The discounted return obtained by the policy generated at each iteration in an episode of length 30 is reported, averaged over 200 trials.
Researcher Affiliation	Academia	Erik Talvitie Mathematics and Computer Science Franklin & Marshall College erik.talvitie@fandm.edu
Pseudocode	Yes	Algorithm 1 DAgger for Model-Based RL ... Algorithm 2 DAgger-MC for one-ply Monte Carlo Planning
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	No	The paper describes a custom 'Shooter' game environment for experiments, but does not provide concrete access information or citations for a publicly available dataset used for training.
Dataset Splits	No	The paper does not provide specific dataset split information (e.g., percentages or sample counts for train/validation/test sets).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions learning algorithms like 'Context Tree Switching' but does not provide specific ancillary software details with version numbers (e.g., programming languages, libraries, or solvers).
Experiment Setup	Yes	In all cases the planning algorithm was one-ply Monte Carlo with 50 rollouts of length 15 of the uniform random policy. ... The discount factor γ was set to 0.9. Each iteration generated a training batch of 500 samples.