Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Active Imitation Learning: Formal and Practical Reductions to I.I.D. Learning

Authors: Kshitij Judah, Alan P. Fern, Thomas G. Dietterich, Prasad Tadepalli

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate RAIL-DA in five test domains and show that it is highly effective when used with an i.i.d. algorithm that takes the unlabeled data density into account. The rest of the paper is organized as follows. ... In Section 6, experimental results are presented. ... We conduct our empirical evaluation on five domains: 1) Cart-pole, 2) Bicycle, 3) Wargus, 4) Driving, and 5) NETtalk. Below we first describe the details of these domains. We then present various experiments that we performed in these domains.
Researcher Affiliation Academia Kshitij Judah EMAIL Alan P. Fern EMAIL Thomas G. Dietterich EMAIL Prasad Tadepalli EMAIL School of Electrical Engineering and Computer Science Oregon State University 1148 Kelley Engineering Center Corvallis, OR 97331-5501, USA
Pseudocode Yes Algorithm 1 Active Forward Training Algorithm 2 RAIL Algorithm 3 RAIL+ Algorithm 4 Density-Weighted Query-By-Committee Algorithm Algorithm 5 Procedure Sample Unlabeled Data Algorithm 6 Procedure Sample Committee
Open Source Code No No explicit statement about providing source code or a link to a repository for the methodology described in this paper was found.
Open Datasets Yes We conduct our empirical evaluation on five domains: 1) Cart-pole, 2) Bicycle, 3) Wargus, 4) Driving, and 5) NETtalk. ... Cart-pole is a well-known RL benchmark. ... This domain is a variant of the RL benchmark of bicycle balancing and riding (Randløv and Alstrøm, 1998). ... We evaluate RAIL-DA on a particular implementation of the driving domain used by Cohn et al. (Cohn et al., 2011). ... We evaluate RAIL-DA on two structured prediction tasks, stress prediction and phoneme prediction, both based on the NETtalk data set (Dietterich et al., 2008). ... We would also like to thank Robert Cohn and Aaron Wilson for making driving and bicycle simulators available.
Dataset Splits Yes For Cart-pole: For each learner, we ran experiments from 150 random initial states close to the equilibrium start state ((x, x, θ, θ) = (0.0, 0.0, 0.0, 0.0)). ... For Bicycle: we used a similar evaluation procedure as for cart-pole where we generated 150 random start states... For Wargus: we designed 21 battle maps differing in the initial unit positions, using 5 for training and 16 for testing. ... For Driving: we ran 100 different learning trials... For NETtalk: The NETtalk data set consists of 2000 words divided into 1000 training words and 1000 test words.
Hardware Specification No No specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for running the experiments are provided in the paper.
Software Dependencies No For all the learners in the experiments that are presented in this section, we employed the Simple Logistic classifier from Weka (Hall et al., 2009) to learn policies over the set of features that were provided for each domain.
Experiment Setup Yes For Cart-pole: We let each episode run for a fixed length of 5000 time steps. ... For Bicycle: The goal is to balance a bicycle moving at a constant speed for 1000 time steps. ... In our experiments we use a committee of size 5. ... Our implementation uses K = 5. ... For Driving: we ran 100 different learning trials where during each trial the learner is allowed to pose a maximum of 500 (1000 in Experiment 1) queries... For NETtalk: In our experiments, we use L = 1, 2.