Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Imitation Learning with Demonstrations and Shaping Rewards

Authors: Kshitij Judah, Alan Fern, Prasad Tadepalli, Robby Goetschalckx

AAAI 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate SHAIL in three domains: Car driving, Cartpole, and an IL formulation of handwriting recognition. Figure 1 shows the results when learning to imitate Driving-Expert1.
Researcher Affiliation	Academia	Kshitij Judah and Alan Fern and Prasad Tadepalli and Robby Goetschalckx School of EECS, Oregon State University, Corvallis, OR, USA, 97331
Pseudocode	Yes	Algorithm 1 Pseudocode for Shaped IL (SHAIL), Algorithm 2 Optimizer for VRs(θ) + λL(θ, D)
Open Source Code	No	No explicit statement or link is provided for open-source code release for the described methodology.
Open Datasets	Yes	Handwriting Recognition. We applied SHAIL to handwriting recognition using the data set from Tasker et al. (Taskar, Guestrin, and Koller 2004).
Dataset Splits	Yes	The dataset has 6600 words divided into 10 folds. We used the ﬁrst fold to produce demonstration data and the remaining folds as test data.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are mentioned in the paper.
Software Dependencies	No	Software components like 'OLPOMDP' and 'linear logistic regression classiﬁer' are mentioned, but no specific version numbers for any software or libraries are provided.
Experiment Setup	Yes	Algorithm 2... Set learning rate α, discount factor γ; run an episode in the MDP; for some # of iterations do; We allow each episode to run for 200 time steps.; In our experiments, we use C = 1.