Keyframe-Focused Visual Imitation Learning
Authors: Chuan Wen, Jierui Lin, Jianing Qian, Yang Gao, Dinesh Jayaraman
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate consistent performance improvements over all baselines on image-based Gym Mu Jo Co continuous control tasks. Finally, on the CARLA photorealistic vision-based urban driving simulator, we resolve a long-standing issue in behavioral cloning for driving by demonstrating effective imitation from observation histories. We now comprehensively evaluate our approach on a photorealistic driving simulator, CARLA (Dosovitskiy et al., 2017), and three image-based Open AI Gym Mu Jo Co robotics environments. |
| Researcher Affiliation | Academia | 1Institute for Interdisciplinary Information Sciences, Tsinghua University 2University of Texas at Austin 3University of Pennsylvania 4Shanghai Qi Zhi Institute. |
| Pseudocode | Yes | Algorithm 1 summarizes our complete approach. |
| Open Source Code | Yes | Supplementary materials and code at: https:// tinyurl.com/imitation-keyframes. |
| Open Datasets | Yes | CARLA is a photorealistic urban driving simulator with varying road and traffic conditions. It has recently emerged as a standard testbed for visual imitation learning, through the publicly available 100-hour CARLA100 driving dataset (Codevilla et al., 2019). We evaluate our method in three standard Open AI Gym Mu Jo Co continuous control environments: Hopper, Half Cheetah and Walker2D. |
| Dataset Splits | No | The paper mentions 'validation data' and 'held-out data' and refers to a 'validation driving trajectory' in Figure 3, but it does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. It only implies the use of compute resources for training deep learning models. |
| Software Dependencies | No | The paper mentions software components like CARLA, Open AI Gym, Mu Jo Co, Resnet-34, and TRPO, but it does not specify version numbers for these or other software dependencies required to replicate the experiment. |
| Experiment Setup | Yes | Our copycat policy network ψ is a a two-layer MLP. For f( ), we experiment with softmax and step functions. The softmax function is applied within each training minibatch, i.e., wi = eτAPEi Pj eτAPEj ; the temperature τ is a hyperparameter. The step function assigns a constant weight wi = W to samples in the top THR percentile of APE and wi = 1 otherwise; W and THR are hyper-parameters. All hyperperameters were set through a simple grid search; see Supp for details. All policies using observation histories are trained by stacking image observations along the channels dimension. Architectural details are environment-specific and discussed in Sec 5. We set history size H = 6. |