Keyframe-Focused Visual Imitation Learning

Authors: Chuan Wen, Jierui Lin, Jianing Qian, Yang Gao, Dinesh Jayaraman

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate consistent performance improvements over all baselines on image-based Gym Mu Jo Co continuous control tasks. Finally, on the CARLA photorealistic vision-based urban driving simulator, we resolve a long-standing issue in behavioral cloning for driving by demonstrating effective imitation from observation histories. We now comprehensively evaluate our approach on a photorealistic driving simulator, CARLA (Dosovitskiy et al., 2017), and three image-based Open AI Gym Mu Jo Co robotics environments.
Researcher Affiliation Academia 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2University of Texas at Austin 3University of Pennsylvania 4Shanghai Qi Zhi Institute.
Pseudocode Yes Algorithm 1 summarizes our complete approach.
Open Source Code Yes Supplementary materials and code at: https:// tinyurl.com/imitation-keyframes.
Open Datasets Yes CARLA is a photorealistic urban driving simulator with varying road and traffic conditions. It has recently emerged as a standard testbed for visual imitation learning, through the publicly available 100-hour CARLA100 driving dataset (Codevilla et al., 2019). We evaluate our method in three standard Open AI Gym Mu Jo Co continuous control environments: Hopper, Half Cheetah and Walker2D.
Dataset Splits No The paper mentions 'validation data' and 'held-out data' and refers to a 'validation driving trajectory' in Figure 3, but it does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. It only implies the use of compute resources for training deep learning models.
Software Dependencies No The paper mentions software components like CARLA, Open AI Gym, Mu Jo Co, Resnet-34, and TRPO, but it does not specify version numbers for these or other software dependencies required to replicate the experiment.
Experiment Setup Yes Our copycat policy network ψ is a a two-layer MLP. For f( ), we experiment with softmax and step functions. The softmax function is applied within each training minibatch, i.e., wi = eτAPEi Pj eτAPEj ; the temperature τ is a hyperparameter. The step function assigns a constant weight wi = W to samples in the top THR percentile of APE and wi = 1 otherwise; W and THR are hyper-parameters. All hyperperameters were set through a simple grid search; see Supp for details. All policies using observation histories are trained by stacking image observations along the channels dimension. Architectural details are environment-specific and discussed in Sec 5. We set history size H = 6.