Behavioral Cloning from Observation

Authors: Faraz Torabi, Garrett Warnell, Peter Stone

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally compare BCO to imitation learning methods, including the state-of-the-art, generative adversarial imitation learning (GAIL) technique, and we show comparable task performance in several different simulation domains while exhibiting increased learning speed after expert trajectories become available.
Researcher Affiliation Collaboration Faraz Torabi1, Garrett Warnell2, Peter Stone1 1 The University of Texas at Austin 2 U.S. Army Research Laboratory {faraztrb,pstone}@cs.utexas.edu, garrett.a.warnell.civ@mail.mil
Pseudocode Yes Algorithm 1 BCO(α)
Open Source Code No The paper does not provide any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We evaluated BCO(α) in several domains available in Open AI Gym [Brockman et al., 2016]. Continuous tasks are simulated by Mu Jo Co [Todorov et al., 2012]. These domains have different levels of difficulty, as measured by the complexity of the dynamics and the size and continuity of the state and action spaces. Ordered from easy to hard, the domains we considered are: Cart Pole, Mountain Car, Reacher, and Ant-v1.
Dataset Splits Yes Because both BC and BCO(α) rely on supervised learning methods, we use only 70% of the available data for training and use the rest for validation. We stop training when the error on the 30% validation data starts to increase.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions software like Open AI Gym and Mu Jo Co, and optimization algorithms like Adam, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Some details regarding particular choices made in this paper: For domains with a continuous action space, we assume a Gaussian distribution over each action dimension and our model estimates the individual means and standard deviations... We use a neural network for Mθ... In order to train this network... we use the Adam variant [Kingma and Ba, 2014] of stochastic gradient decent.