Agnostic System Identification for Monte Carlo Planning
Authors: Erik Talvitie
AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A simple empirical illustration (Section 5) demonstrates that these issues can arise in practice and that DAgger-MC can perform well in cases where DAgger fails. ... 5 Experiments Consider the simple game, Shooter, which is shown in Figure 3. ... The results of applying DAgger and DAgger-MC to this problem (using various expert policies to generate ν) are shown in Figure 2. ... The discounted return obtained by the policy generated at each iteration in an episode of length 30 is reported, averaged over 200 trials. |
| Researcher Affiliation | Academia | Erik Talvitie Mathematics and Computer Science Franklin & Marshall College erik.talvitie@fandm.edu |
| Pseudocode | Yes | Algorithm 1 DAgger for Model-Based RL ... Algorithm 2 DAgger-MC for one-ply Monte Carlo Planning |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper describes a custom 'Shooter' game environment for experiments, but does not provide concrete access information or citations for a publicly available dataset used for training. |
| Dataset Splits | No | The paper does not provide specific dataset split information (e.g., percentages or sample counts for train/validation/test sets). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions learning algorithms like 'Context Tree Switching' but does not provide specific ancillary software details with version numbers (e.g., programming languages, libraries, or solvers). |
| Experiment Setup | Yes | In all cases the planning algorithm was one-ply Monte Carlo with 50 rollouts of length 15 of the uniform random policy. ... The discount factor γ was set to 0.9. Each iteration generated a training batch of 500 samples. |