Adversarial Option-Aware Hierarchical Imitation Learning
Authors: Mingxuan Jing, Wenbing Huang, Fuchun Sun, Xiaojian Ma, Tao Kong, Chuang Gan, Lei Li
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our proposed method on several robotic locomotion and manipulation tasks against state-of-the-art HIL/IL baselines. The results demonstrate that our approach attains both dramatically faster convergence and better final performance over the counterparts. A complete set of ablation studies also verify the validity of each component we proposed. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science and Technology, Tsinghua University, Beijing, China (Mingxuan Jing <jingmingxuan@outlook.com>; Wenbing Huang <hwenbing@126.com>; Fuchun Sun <fcsun@tsinghua.edu.cn>) 2THU-Bosch JCML center 3University of California, Los Angeles, USA 4Bytedance AI Lab, Beijing, China 5MIT-IBM Watson AI Lab, USA. |
| Pseudocode | Yes | Algorithm 1 Option-GAIL |
| Open Source Code | No | The paper does not include an explicit statement or link for the release of its own source code. |
| Open Datasets | Yes | Hopper-v2 and Walker2d-v2: The Hopper-v2 and the Walker2d-v2 are two standardized continuous-time locomotion environments implemented in the Open AI Gym (Brockman et al., 2016) with the Mu Jo Co (Todorov et al., 2012) physics simulator. ... Ant Push-v0: ...proposed in Nachum et al. (2018), ... Close Microwave2: The Closemicrowave2 is a more challenging robot operation environment in RLBench (James et al., 2020). |
| Dataset Splits | No | The paper mentions using expert demonstrations for 'learning' but does not specify a validation set or how the demonstrations are split for training and validation purposes of the proposed model. |
| Hardware Specification | No | The paper does not specify any hardware details like GPU/CPU models or other computing infrastructure used for the experiments. |
| Software Dependencies | No | The paper references various software and environments like Open AI Gym (Brockman et al., 2016), Mu Jo Co (Todorov et al., 2012), PPO (Schulman et al., 2017), DAC (Zhang & Whiteson, 2019), and RLBench (James et al., 2020), but it does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Specifically, we allow 4 available option classes for all environments, a Multi-Layer Perception(MLP) with hidden size (64, 64) to implement the policies of both levels on Hopper-v2, Walker2d-v2, Ant Push-v0, and (128, 128) on Closemicrowave2; the discriminator is realized by an MLP with hidden size (256, 256) on all environments. |