Imitation Learning with Demonstrations and Shaping Rewards
Authors: Kshitij Judah, Alan Fern, Prasad Tadepalli, Robby Goetschalckx
AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate SHAIL in three domains: Car driving, Cartpole, and an IL formulation of handwriting recognition. Figure 1 shows the results when learning to imitate Driving-Expert1. |
| Researcher Affiliation | Academia | Kshitij Judah and Alan Fern and Prasad Tadepalli and Robby Goetschalckx School of EECS, Oregon State University, Corvallis, OR, USA, 97331 |
| Pseudocode | Yes | Algorithm 1 Pseudocode for Shaped IL (SHAIL), Algorithm 2 Optimizer for VRs(θ) + λL(θ, D) |
| Open Source Code | No | No explicit statement or link is provided for open-source code release for the described methodology. |
| Open Datasets | Yes | Handwriting Recognition. We applied SHAIL to handwriting recognition using the data set from Tasker et al. (Taskar, Guestrin, and Koller 2004). |
| Dataset Splits | Yes | The dataset has 6600 words divided into 10 folds. We used the first fold to produce demonstration data and the remaining folds as test data. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are mentioned in the paper. |
| Software Dependencies | No | Software components like 'OLPOMDP' and 'linear logistic regression classifier' are mentioned, but no specific version numbers for any software or libraries are provided. |
| Experiment Setup | Yes | Algorithm 2... Set learning rate α, discount factor γ; run an episode in the MDP; for some # of iterations do; We allow each episode to run for 200 time steps.; In our experiments, we use C = 1. |