Agnostic System Identification for Monte Carlo Planning

Authors: Erik Talvitie

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental A simple empirical illustration (Section 5) demonstrates that these issues can arise in practice and that DAgger-MC can perform well in cases where DAgger fails. ... 5 Experiments Consider the simple game, Shooter, which is shown in Figure 3. ... The results of applying DAgger and DAgger-MC to this problem (using various expert policies to generate ν) are shown in Figure 2. ... The discounted return obtained by the policy generated at each iteration in an episode of length 30 is reported, averaged over 200 trials.
Researcher Affiliation Academia Erik Talvitie Mathematics and Computer Science Franklin & Marshall College erik.talvitie@fandm.edu
Pseudocode Yes Algorithm 1 DAgger for Model-Based RL ... Algorithm 2 DAgger-MC for one-ply Monte Carlo Planning
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No The paper describes a custom 'Shooter' game environment for experiments, but does not provide concrete access information or citations for a publicly available dataset used for training.
Dataset Splits No The paper does not provide specific dataset split information (e.g., percentages or sample counts for train/validation/test sets).
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions learning algorithms like 'Context Tree Switching' but does not provide specific ancillary software details with version numbers (e.g., programming languages, libraries, or solvers).
Experiment Setup Yes In all cases the planning algorithm was one-ply Monte Carlo with 50 rollouts of length 15 of the uniform random policy. ... The discount factor γ was set to 0.9. Each iteration generated a training batch of 500 samples.