Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Adversarial Option-Aware Hierarchical Imitation Learning
Authors: Mingxuan Jing, Wenbing Huang, Fuchun Sun, Xiaojian Ma, Tao Kong, Chuang Gan, Lei Li
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our proposed method on several robotic locomotion and manipulation tasks against state-of-the-art HIL/IL baselines. The results demonstrate that our approach attains both dramatically faster convergence and better final performance over the counterparts. A complete set of ablation studies also verify the validity of each component we proposed. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science and Technology, Tsinghua University, Beijing, China (Mingxuan Jing <EMAIL>; Wenbing Huang <EMAIL>; Fuchun Sun <EMAIL>) 2THU-Bosch JCML center 3University of California, Los Angeles, USA 4Bytedance AI Lab, Beijing, China 5MIT-IBM Watson AI Lab, USA. |
| Pseudocode | Yes | Algorithm 1 Option-GAIL |
| Open Source Code | No | The paper does not include an explicit statement or link for the release of its own source code. |
| Open Datasets | Yes | Hopper-v2 and Walker2d-v2: The Hopper-v2 and the Walker2d-v2 are two standardized continuous-time locomotion environments implemented in the Open AI Gym (Brockman et al., 2016) with the Mu Jo Co (Todorov et al., 2012) physics simulator. ... Ant Push-v0: ...proposed in Nachum et al. (2018), ... Close Microwave2: The Closemicrowave2 is a more challenging robot operation environment in RLBench (James et al., 2020). |
| Dataset Splits | No | The paper mentions using expert demonstrations for 'learning' but does not specify a validation set or how the demonstrations are split for training and validation purposes of the proposed model. |
| Hardware Specification | No | The paper does not specify any hardware details like GPU/CPU models or other computing infrastructure used for the experiments. |
| Software Dependencies | No | The paper references various software and environments like Open AI Gym (Brockman et al., 2016), Mu Jo Co (Todorov et al., 2012), PPO (Schulman et al., 2017), DAC (Zhang & Whiteson, 2019), and RLBench (James et al., 2020), but it does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Specifically, we allow 4 available option classes for all environments, a Multi-Layer Perception(MLP) with hidden size (64, 64) to implement the policies of both levels on Hopper-v2, Walker2d-v2, Ant Push-v0, and (128, 128) on Closemicrowave2; the discriminator is realized by an MLP with hidden size (256, 256) on all environments. |