Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Active Imitation Learning: Formal and Practical Reductions to I.I.D. Learning

Authors: Kshitij Judah, Alan P. Fern, Thomas G. Dietterich, Prasad Tadepalli

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate RAIL-DA in ﬁve test domains and show that it is highly eﬀective when used with an i.i.d. algorithm that takes the unlabeled data density into account. The rest of the paper is organized as follows. ... In Section 6, experimental results are presented. ... We conduct our empirical evaluation on ﬁve domains: 1) Cart-pole, 2) Bicycle, 3) Wargus, 4) Driving, and 5) NETtalk. Below we ﬁrst describe the details of these domains. We then present various experiments that we performed in these domains.
Researcher Affiliation	Academia	Kshitij Judah EMAIL Alan P. Fern EMAIL Thomas G. Dietterich EMAIL Prasad Tadepalli EMAIL School of Electrical Engineering and Computer Science Oregon State University 1148 Kelley Engineering Center Corvallis, OR 97331-5501, USA
Pseudocode	Yes	Algorithm 1 Active Forward Training Algorithm 2 RAIL Algorithm 3 RAIL+ Algorithm 4 Density-Weighted Query-By-Committee Algorithm Algorithm 5 Procedure Sample Unlabeled Data Algorithm 6 Procedure Sample Committee
Open Source Code	No	No explicit statement about providing source code or a link to a repository for the methodology described in this paper was found.
Open Datasets	Yes	We conduct our empirical evaluation on ﬁve domains: 1) Cart-pole, 2) Bicycle, 3) Wargus, 4) Driving, and 5) NETtalk. ... Cart-pole is a well-known RL benchmark. ... This domain is a variant of the RL benchmark of bicycle balancing and riding (Randløv and Alstrøm, 1998). ... We evaluate RAIL-DA on a particular implementation of the driving domain used by Cohn et al. (Cohn et al., 2011). ... We evaluate RAIL-DA on two structured prediction tasks, stress prediction and phoneme prediction, both based on the NETtalk data set (Dietterich et al., 2008). ... We would also like to thank Robert Cohn and Aaron Wilson for making driving and bicycle simulators available.
Dataset Splits	Yes	For Cart-pole: For each learner, we ran experiments from 150 random initial states close to the equilibrium start state ((x, x, θ, θ) = (0.0, 0.0, 0.0, 0.0)). ... For Bicycle: we used a similar evaluation procedure as for cart-pole where we generated 150 random start states... For Wargus: we designed 21 battle maps diﬀering in the initial unit positions, using 5 for training and 16 for testing. ... For Driving: we ran 100 diﬀerent learning trials... For NETtalk: The NETtalk data set consists of 2000 words divided into 1000 training words and 1000 test words.
Hardware Specification	No	No specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for running the experiments are provided in the paper.
Software Dependencies	No	For all the learners in the experiments that are presented in this section, we employed the Simple Logistic classiﬁer from Weka (Hall et al., 2009) to learn policies over the set of features that were provided for each domain.
Experiment Setup	Yes	For Cart-pole: We let each episode run for a ﬁxed length of 5000 time steps. ... For Bicycle: The goal is to balance a bicycle moving at a constant speed for 1000 time steps. ... In our experiments we use a committee of size 5. ... Our implementation uses K = 5. ... For Driving: we ran 100 diﬀerent learning trials where during each trial the learner is allowed to pose a maximum of 500 (1000 in Experiment 1) queries... For NETtalk: In our experiments, we use L = 1, 2.