Imitation Learning with Demonstrations and Shaping Rewards

Authors: Kshitij Judah, Alan Fern, Prasad Tadepalli, Robby Goetschalckx

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate SHAIL in three domains: Car driving, Cartpole, and an IL formulation of handwriting recognition. Figure 1 shows the results when learning to imitate Driving-Expert1.
Researcher Affiliation Academia Kshitij Judah and Alan Fern and Prasad Tadepalli and Robby Goetschalckx School of EECS, Oregon State University, Corvallis, OR, USA, 97331
Pseudocode Yes Algorithm 1 Pseudocode for Shaped IL (SHAIL), Algorithm 2 Optimizer for VRs(θ) + λL(θ, D)
Open Source Code No No explicit statement or link is provided for open-source code release for the described methodology.
Open Datasets Yes Handwriting Recognition. We applied SHAIL to handwriting recognition using the data set from Tasker et al. (Taskar, Guestrin, and Koller 2004).
Dataset Splits Yes The dataset has 6600 words divided into 10 folds. We used the first fold to produce demonstration data and the remaining folds as test data.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are mentioned in the paper.
Software Dependencies No Software components like 'OLPOMDP' and 'linear logistic regression classifier' are mentioned, but no specific version numbers for any software or libraries are provided.
Experiment Setup Yes Algorithm 2... Set learning rate α, discount factor γ; run an episode in the MDP; for some # of iterations do; We allow each episode to run for 200 time steps.; In our experiments, we use C = 1.