Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Imitation Learning with Demonstrations and Shaping Rewards
Authors: Kshitij Judah, Alan Fern, Prasad Tadepalli, Robby Goetschalckx
AAAI 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate SHAIL in three domains: Car driving, Cartpole, and an IL formulation of handwriting recognition. Figure 1 shows the results when learning to imitate Driving-Expert1. |
| Researcher Affiliation | Academia | Kshitij Judah and Alan Fern and Prasad Tadepalli and Robby Goetschalckx School of EECS, Oregon State University, Corvallis, OR, USA, 97331 |
| Pseudocode | Yes | Algorithm 1 Pseudocode for Shaped IL (SHAIL), Algorithm 2 Optimizer for VRs(θ) + λL(θ, D) |
| Open Source Code | No | No explicit statement or link is provided for open-source code release for the described methodology. |
| Open Datasets | Yes | Handwriting Recognition. We applied SHAIL to handwriting recognition using the data set from Tasker et al. (Taskar, Guestrin, and Koller 2004). |
| Dataset Splits | Yes | The dataset has 6600 words divided into 10 folds. We used the first fold to produce demonstration data and the remaining folds as test data. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are mentioned in the paper. |
| Software Dependencies | No | Software components like 'OLPOMDP' and 'linear logistic regression classifier' are mentioned, but no specific version numbers for any software or libraries are provided. |
| Experiment Setup | Yes | Algorithm 2... Set learning rate α, discount factor γ; run an episode in the MDP; for some # of iterations do; We allow each episode to run for 200 time steps.; In our experiments, we use C = 1. |