Imitation Learning from Video by Leveraging Proprioception

Authors: Faraz Torabi, Garrett Warnell, Peter Stone

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally test the proposed technique on several Mu Jo Co domains and show that it outperforms other imitation from observation algorithms by a large margin. In this section, we describe the experimental procedure by which we evaluated this hypothesis, and discuss the results.
Researcher Affiliation Collaboration Faraz Torabi1 , Garrett Warnell2 and Peter Stone1 1The University of Texas at Austin 2Army Research Laboratory {faraztrb, pstone}@cs.utexas.edu, garrett.a.warnell.civ@mail.mil
Pseudocode Yes Pseudocode and a diagrammatic representation of our proposed algorithm are presented in Algorithm 1 and Figure 1, respectively.
Open Source Code No The paper states: 'The considered domains, methods, and implementations are presented in more detail in the longer version of the paper on ar Xiv [Torabi et al., 2019c]'. This refers to an arXiv preprint of this very paper, not a source code repository.
Open Datasets Yes We evaluated our method on a subset of the continuous control tasks available via Open AI Gym [Brockman et al., 2016] and the Mu Jo Co simulator [Todorov et al., 2012]: Mountain Car Continuous, Inverted Pendulum, Inverted Double Pendulum, Hopper, Walker2d, Half Cheetah. After the expert agents were trained, we recorded 64 64, 30-fps video demonstrations of their behavior.
Dataset Splits No The paper mentions generating results using 'ten independent trials' and measuring performance over '1000 trajectories', but it does not specify explicit training, validation, or test splits for the demonstration data used in the imitation learning process.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions using 'Open AI Gym' and 'Mu Jo Co simulator' and 'PPO' but does not specify version numbers for these or any other software dependencies.
Experiment Setup Yes To generate the demonstration data, we first trained an expert agents using pure reinforcement learning (i.e., not from imitation). More specifically, we used proximal policy optimization (PPO) [Schulman et al., 2017] and the ground truth reward function provided by Open AI Gym. After the expert agents were trained, we recorded 64 64, 30-fps video demonstrations of their behavior. The results shown here were generated using ten inde-pendent trials, where each trial used a different random seed to initialize the environments, model parameters, etc.