reproducibilityindex.ai

Behavioral Cloning from Observation

Authors: Faraz Torabi, Garrett Warnell, Peter Stone

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally compare BCO to imitation learning methods, including the state-of-the-art, generative adversarial imitation learning (GAIL) technique, and we show comparable task performance in several different simulation domains while exhibiting increased learning speed after expert trajectories become available.
Researcher Affiliation	Collaboration	Faraz Torabi1, Garrett Warnell2, Peter Stone1 1 The University of Texas at Austin 2 U.S. Army Research Laboratory {faraztrb,pstone}@cs.utexas.edu, garrett.a.warnell.civ@mail.mil
Pseudocode	Yes	Algorithm 1 BCO(α)
Open Source Code	No	The paper does not provide any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We evaluated BCO(α) in several domains available in Open AI Gym [Brockman et al., 2016]. Continuous tasks are simulated by Mu Jo Co [Todorov et al., 2012]. These domains have different levels of difﬁculty, as measured by the complexity of the dynamics and the size and continuity of the state and action spaces. Ordered from easy to hard, the domains we considered are: Cart Pole, Mountain Car, Reacher, and Ant-v1.
Dataset Splits	Yes	Because both BC and BCO(α) rely on supervised learning methods, we use only 70% of the available data for training and use the rest for validation. We stop training when the error on the 30% validation data starts to increase.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions software like Open AI Gym and Mu Jo Co, and optimization algorithms like Adam, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Some details regarding particular choices made in this paper: For domains with a continuous action space, we assume a Gaussian distribution over each action dimension and our model estimates the individual means and standard deviations... We use a neural network for Mθ... In order to train this network... we use the Adam variant [Kingma and Ba, 2014] of stochastic gradient decent.