Adversarial Option-Aware Hierarchical Imitation Learning

Authors: Mingxuan Jing, Wenbing Huang, Fuchun Sun, Xiaojian Ma, Tao Kong, Chuang Gan, Lei Li

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our proposed method on several robotic locomotion and manipulation tasks against state-of-the-art HIL/IL baselines. The results demonstrate that our approach attains both dramatically faster convergence and better final performance over the counterparts. A complete set of ablation studies also verify the validity of each component we proposed.
Researcher Affiliation Collaboration 1Department of Computer Science and Technology, Tsinghua University, Beijing, China (Mingxuan Jing <jingmingxuan@outlook.com>; Wenbing Huang <hwenbing@126.com>; Fuchun Sun <fcsun@tsinghua.edu.cn>) 2THU-Bosch JCML center 3University of California, Los Angeles, USA 4Bytedance AI Lab, Beijing, China 5MIT-IBM Watson AI Lab, USA.
Pseudocode Yes Algorithm 1 Option-GAIL
Open Source Code No The paper does not include an explicit statement or link for the release of its own source code.
Open Datasets Yes Hopper-v2 and Walker2d-v2: The Hopper-v2 and the Walker2d-v2 are two standardized continuous-time locomotion environments implemented in the Open AI Gym (Brockman et al., 2016) with the Mu Jo Co (Todorov et al., 2012) physics simulator. ... Ant Push-v0: ...proposed in Nachum et al. (2018), ... Close Microwave2: The Closemicrowave2 is a more challenging robot operation environment in RLBench (James et al., 2020).
Dataset Splits No The paper mentions using expert demonstrations for 'learning' but does not specify a validation set or how the demonstrations are split for training and validation purposes of the proposed model.
Hardware Specification No The paper does not specify any hardware details like GPU/CPU models or other computing infrastructure used for the experiments.
Software Dependencies No The paper references various software and environments like Open AI Gym (Brockman et al., 2016), Mu Jo Co (Todorov et al., 2012), PPO (Schulman et al., 2017), DAC (Zhang & Whiteson, 2019), and RLBench (James et al., 2020), but it does not specify version numbers for these or other software dependencies.
Experiment Setup Yes Specifically, we allow 4 available option classes for all environments, a Multi-Layer Perception(MLP) with hidden size (64, 64) to implement the policies of both levels on Hopper-v2, Walker2d-v2, Ant Push-v0, and (128, 128) on Closemicrowave2; the discriminator is realized by an MLP with hidden size (256, 256) on all environments.